Senior Data Engineer
Remote in Colombia
Data Software Engineering
& 6 others
Choose an option
We are seeking a highly skilled Senior Data Software Engineer to join our remote team, working with a British-American publicly traded analytics company that operates a collection of subscription-based services. As a Senior Data Software Engineer, you will be responsible for developing and deploying production-grade ETL pipelines using Apache Spark. If you are passionate about data software engineering and have a keen eye for detail, we invite you to be part of our team.
Responsibilities
- Develop and deploy production-grade ETL pipelines using Apache Spark (PySpark) to handle large data volumes
- Contribute to the design, development, and deployment of scalable and reliable data pipelines
- Perform data analysis and data quality checks to ensure data accuracy and integrity
- Create and maintain technical documentation for all ETL processes and data pipelines
- Collaborate with cross-functional teams to review software requirements and ensure seamless integration with other systems
- Provide technical guidance and mentorship to junior team members
Requirements
- Minimum of 3 years of experience in Data Software Engineering, showcasing your expertise in Python scripting for data handling (usage of pandas, numpy)
- Strong expertise of Apache Spark (PySpark) for handle large data volumes (few hundred GBs)
- Experience in writing UDFs in Spark
- Experience with Relational DB, PostgreSQL/Snowflake preferred
- Basic knowledge of Big data technologies like HBase and Hive
- Hands-on experience with AWS Cloud, AWS Glue, Amazon EMR, and Databricks
- Fluent spoken and written English at an Upper-Intermediate level or higher
Nice to have
- Prior experience on Apache Airflow is desirable
- Experience in AWS data services like Glue & EMR