Lead Data Software Engineer (Databricks)
Argentina
We are seeking a skilled and driven Lead Data Software Engineer with extensive expertise in Databricks and data streaming technologies to join our innovative team.
In this position, you will apply your knowledge of big data engineering, cloud systems, and real-time streaming solutions to develop scalable and efficient data platforms that empower critical business insights.
Responsibilities
- Implement data pipelines within Databricks, following medallion architecture principles for structured data organization
- Optimize batch and streaming pipelines, incorporating Stream Tables, Delta Live Tables, Change Data Capture (CDC), and Slowly Changing Dimensions (SCD)
- Manage Databricks Asset Bundles (DABs) to handle packaging, deployment, and artifact versioning
- Oversee workflows, job orchestration, and scheduling on Databricks to ensure system reliability
- Design real-time streaming platforms leveraging tools such as Apache Kafka, Confluent, and Redpanda
- Enforce data contracts and schema compatibility through the use of a Schema Registry
- Create efficient data processing solutions utilizing Spark, SQL, and Python
- Utilize relational and non-relational databases, including MySQL, PostgreSQL, and DynamoDB, for optimized data storage
- Enhance the performance of database queries for both operational and analytical purposes
- Collaborate with cross-functional teams to define project requirements and deliver end-to-end data solutions
- Maintain high engineering standards by employing CI/CD pipelines and using Git for version control
Requirements
- At least 5 years of experience in Data Software Engineering roles
- Proficiency in working with the Databricks ecosystem, including Spark, Delta Lake, Unity Catalog, and workflows
- Expertise in building ETL/ELT workflows, featuring batch/stream pipelines, CDC, and SCD processes
- Advanced skills in Spark programming, SQL performance tuning, and Python development
- Practical understanding of event-driven architectures using tools such as Kafka, Confluent, or Redpanda
- Knowledge of leading cloud platforms (e.g., AWS or GCP) for building and managing scalable data infrastructures
- Expertise in both relational and non-relational databases, such as MySQL, PostgreSQL, and DynamoDB
- Understanding of data modeling techniques like star and snowflake schemas for analytics purposes
- Familiarity with CI/CD practices, version control platforms like Git, and infrastructure-as-code tools such as Terraform
- A strong analytical mindset with the ability to troubleshoot and resolve complex technical challenges
- Clear and effective communication abilities for collaborating across multidisciplinary teams
Nice to have
- Understanding of data governance methodologies and regulatory standards, including GDPR, CCPA, and SOC2
- Familiarity with big data tools, such as Apache Hadoop or Snowflake, to complement existing expertise
- Relevant certifications that demonstrate proficiency, such as Databricks Data Engineer Associate or AWS Cloud credentials
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn