Lead Data Software Engineer (Databricks)

Data Software Engineering

Location-specific conditions & benefits*

Argentina

We are seeking a skilled and driven Lead Data Software Engineer with extensive expertise in Databricks and data streaming technologies to join our innovative team.

In this position, you will apply your knowledge of big data engineering, cloud systems, and real-time streaming solutions to develop scalable and efficient data platforms that empower critical business insights.

Responsibilities

Implement data pipelines within Databricks, following medallion architecture principles for structured data organization
Optimize batch and streaming pipelines, incorporating Stream Tables, Delta Live Tables, Change Data Capture (CDC), and Slowly Changing Dimensions (SCD)
Manage Databricks Asset Bundles (DABs) to handle packaging, deployment, and artifact versioning
Oversee workflows, job orchestration, and scheduling on Databricks to ensure system reliability
Design real-time streaming platforms leveraging tools such as Apache Kafka, Confluent, and Redpanda
Enforce data contracts and schema compatibility through the use of a Schema Registry
Create efficient data processing solutions utilizing Spark, SQL, and Python
Utilize relational and non-relational databases, including MySQL, PostgreSQL, and DynamoDB, for optimized data storage
Enhance the performance of database queries for both operational and analytical purposes
Collaborate with cross-functional teams to define project requirements and deliver end-to-end data solutions
Maintain high engineering standards by employing CI/CD pipelines and using Git for version control

Requirements

At least 5 years of experience in Data Software Engineering roles
Proficiency in working with the Databricks ecosystem, including Spark, Delta Lake, Unity Catalog, and workflows
Expertise in building ETL/ELT workflows, featuring batch/stream pipelines, CDC, and SCD processes
Advanced skills in Spark programming, SQL performance tuning, and Python development
Practical understanding of event-driven architectures using tools such as Kafka, Confluent, or Redpanda
Knowledge of leading cloud platforms (e.g., AWS or GCP) for building and managing scalable data infrastructures
Expertise in both relational and non-relational databases, such as MySQL, PostgreSQL, and DynamoDB
Understanding of data modeling techniques like star and snowflake schemas for analytics purposes
Familiarity with CI/CD practices, version control platforms like Git, and infrastructure-as-code tools such as Terraform
A strong analytical mindset with the ability to troubleshoot and resolve complex technical challenges
Clear and effective communication abilities for collaborating across multidisciplinary teams

Nice to have

Understanding of data governance methodologies and regulatory standards, including GDPR, CCPA, and SOC2
Familiarity with big data tools, such as Apache Hadoop or Snowflake, to complement existing expertise
Relevant certifications that demonstrate proficiency, such as Databricks Data Engineer Associate or AWS Cloud credentials

We offer/Benefits

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn