Lead Data Software Engineer

Data Software Engineering

Location-specific conditions & benefits*

Portugal

We are seeking a dedicated and experienced Lead Data Software Engineer to join our team, concentrating on creating a next-generation framework for centralized platform infrastructure.

You will oversee the development of core infrastructure components, facilitate parser pipeline integration, and uphold operational excellence through enhanced practices in monitoring, cost management, and event-driven architectures.

Responsibilities

Design scalable, centralized platform architecture tailored for data parser pipelines
Integrate solutions for processing tabular and unstructured datasets using frameworks compatible with Unity Catalog tables and volumes
Implement robust observability methods to ensure consistent logging and monitoring capabilities
Collaborate with teams to enhance framework efficiency while reducing operational costs
Support ingestion workflows from the raw data layer through cloud-native technologies
Integrate modern infrastructure tools to extend the functionality of the framework
Provide mentorship and establish coding standards aligned with best practices
Coordinate onboarding procedures across diverse parser pipelines with stakeholders
Address performance bottlenecks and implement optimization strategies for data pipelines
Maintain awareness of emerging technologies to ensure alignment with future framework requirements

Requirements

5+ years of industry experience in software engineering with a focus on cloud systems or data infrastructure
1+ years of leadership experience in relevant roles
Proficiency in Python and knowledge of Azure Databricks
Expertise in Azure Kubernetes, Azure Data Explorer, and Azure DataLake storage
Strong understanding of core cloud computing principles, especially Azure-related capabilities
Hands-on experience with Terraform for Infrastructure-as-Code and Azure DevOps for repositories, pipelines, and artifact management
Background in leveraging observability tools for centralized logging and monitoring
Skills in PySpark for processing and transforming large-scale datasets

Nice to have

Competency in cost management strategies for scalable cloud infrastructure
Familiarity with Unity Catalog for organizing tabular and unstructured data assets
Flexibility to use event-driven designs in file ingestion workflows

We offer/Benefits

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn