Senior Data Platform Operations Engineer
Application Support
& 4 others
Choose an option
We are seeking a highly skilled Senior Data Platform Operations Engineer to ensure the stability, security, performance, and cost efficiency of our global enterprise data platform.
This role is pivotal in providing 8/5 operational coverage within a follow-the-sun 24x5 support model, ensuring the platform consistently supports business activities worldwide. The ideal candidate will demonstrate expertise in cloud-based data platforms, a strong operational mindset, and a proactive approach to optimizing performance, enhancing observability, and managing costs.
Responsibilities
- Maintain a stable, secure, and performant enterprise data platform (Snowflake, AWS data stack, dbt, orchestration tools, BI/analytics, etc.)
- Provide operational coverage within an 8/5 support model and participate in a 24/7 on-call rotation for critical incidents
- Implement robust monitoring, alerting, and observability solutions to facilitate proactive incident detection and resolution
- Perform platform upgrades, patching, and configuration management in alignment with security and compliance requirements
- Continuously tune system performance to meet evolving business needs
- Use holistic observability frameworks covering infrastructure, data pipelines, and platform services to execute monitoring activities
- Deliver actionable operational insights through monitoring dashboards and reporting
- Identify and execute process automation to improve efficiency and reduce manual interventions
- Propose and implement continuous improvements to advance platform resilience, scalability, and cost-effectiveness
- Contribute to infrastructure-as-code and configuration-as-code practices for consistent, repeatable operations
Requirements
- Background in managing cloud-native data platforms for over 3 years (e.g., Snowflake, Databricks, BigQuery, or similar)
- Expertise in cloud infrastructure (AWS) with emphasis on operations, automation, and cost governance
- Skills in monitoring and observability tools (Datadog, Prometheus, Grafana, ELK, CloudWatch, etc.)
- Knowledge of Infrastructure as Code (Terraform, Pulumi, Ansible) and configuration management practices
- Understanding of networking, security, and compliance in cloud environments
- Competency in problem-solving with a proactive, service-oriented mindset
- Flexibility to work in a global operations environment with on-call responsibilities
- Qualifications in clear communication and collaboration with engineering, data, and business stakeholders
- Commitment to continuous improvement and operational excellence
- Proficiency in English language at an Upper-Intermediate level (B2) or higher
Nice to have
- Showcase of implementing FinOps frameworks and cost optimization practices
- Background in working within regulated industries (pharma, healthcare, finance) in compliance-driven environments
- Familiarity with modern data stack tools (dbt, Dagster/Airflow, ThoughtSpot, Tableau, Power BI)
- Understanding of SRE (Site Reliability Engineering) principles and practices