Lead Data Software Engineer (Databricks)

Location-specific conditions & benefits*

Argentina

We are seeking a skilled and highly driven Lead Data Software Engineer with expertise in Databricks and data streaming technologies to join our innovative team.

This position requires leveraging skills in big data engineering, cloud platforms, and real-time streaming to create scalable, efficient data platforms that enable critical business insights.

Responsibilities

Architect data pipelines within Databricks using principles like medallion architecture for structured data management
Build and support both batch and real-time pipelines, including Stream Tables, Delta Live Tables, Change Data Capture (CDC), and Slowly Changing Dimensions (SCD)
Manage Databricks Asset Bundles (DABs) for deployment, versioning, and artifact management
Coordinate workflows, job schedules, and orchestration within Databricks to ensure operational consistency
Establish and maintain real-time streaming data platforms using tools like Apache Kafka, Confluent, or Redpanda
Utilize a Schema Registry to uphold data contracts and enforce schema validity
Develop efficient data processing solutions leveraging Spark, SQL, and Python
Handle relational and non-relational databases, including MySQL, PostgreSQL, and DynamoDB
Fine-tune database queries to enhance both operational and analytical performance
Collaborate with cross-functional teams to gather requirements and deliver effective data systems
Promote high-quality engineering practices by leveraging CI/CD workflows and version control tools such as Git

Requirements

5+ years of experience in Data Software Engineering
Practical skills in Databricks, including Spark, Delta Lake, Unity Catalog, and workflow management
Expertise in building ETL/ELT pipelines, including batch and real-time streaming solutions, Change Data Capture (CDC), and Slowly Changing Dimensions (SCD)
Advanced skills in Spark programming, SQL queries, and Python scripting
Proficiency with event-driven architectures through tools like Kafka, Confluent, or Redpanda
Knowledge of cloud platforms such as AWS or GCP for managing data infrastructure
Familiarity with relational and non-relational databases, such as MySQL, PostgreSQL, and DynamoDB
Understanding of advanced data modeling approaches like star schema and snowflake schema for analytics
Capability to work with CI/CD pipelines, Git for version control, and tools like Terraform for infrastructure-as-code requirements
Effective problem-solving skills to address technical and architectural challenges
Strong communication skills for collaboration across diverse teams

Nice to have

Knowledge of data governance standards and compliance protocols like GDPR, CCPA, or SOC2
Understanding of big data tools such as Apache Hadoop or Snowflake
Relevant certifications, including Databricks Data Engineer Associate or AWS specialized credentials

We offer/Benefits

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn