Senior AI Platform Engineer - Databricks & AWS
Hybrid in Mexico
Platform Engineering
& 11 others
Join our team as a Senior AI Platform Engineer, where you will design, deploy, and maintain next-generation Databricks platforms on AWS to support advanced analytics and machine learning workflows.
You will collaborate closely with data scientists and ML engineers to deliver a seamless developer experience on the Lakehouse. Apply now to contribute to cutting-edge AI infrastructure development.
Responsibilities
- Design and implement scalable Databricks platform solutions for analytics, ML, and GenAI workflows across development, testing, and production environments
- Administer and optimize Databricks workspaces including cluster policies, pools, job clusters, autoscaling, and GPU/accelerated compute
- Implement and manage Unity Catalog governance including metastores, catalogs, schemas, data sharing, masking, lineage, and access controls
- Build and maintain Infrastructure as Code using Terraform for reproducible platform provisioning and configuration
- Implement CI/CD pipelines for notebooks, libraries, DLT pipelines, and ML assets using GitHub Actions and Databricks APIs
- Standardize experiment tracking, model registry workflows, and deploy model serving endpoints with monitoring and rollback capabilities
- Develop and optimize Delta Lake batch and streaming pipelines using Auto Loader, Structured Streaming, and DLT, enforcing data quality and SLAs
- Collaborate with cross-functional teams to integrate platform capabilities and ensure best-in-class developer experience
- Monitor platform performance, troubleshoot issues, and implement improvements to ensure reliability and scalability
- Maintain documentation and automation runbooks for platform operations and governance
- Coordinate with security teams to enforce data governance, encryption, and compliance policies
- Promote best practices for coding, testing, and deployment within the platform engineering team
- Drive continuous improvement in platform automation and operational efficiency
- Engage with stakeholders to gather requirements and provide technical guidance
- Mentor junior engineers and share knowledge of platform technologies
Requirements
- Proven hands-on experience administering Databricks on AWS including Unity Catalog governance and enterprise integrations, with 3+ years in platform engineering
- Strong foundation in AWS services such as VPC, IAM, KMS, S3, CloudWatch, and network architecture
- Proficiency with Terraform including databricks provider, and experience with Infrastructure as Code for cloud resources
- Advanced Python and SQL skills with experience packaging libraries and managing notebooks and repos
- Experience with MLflow for experiment tracking, model registry, and familiarity with model serving endpoints
- Knowledge of Delta Lake, Auto Loader, Structured Streaming, and DLT
- Experience implementing DevOps automation, CI/CD pipelines, and using GitHub Actions or similar tools
- Strong Git and GitHub proficiency including code review and branching strategies
- Familiarity with REST APIs, Databricks CLI, and scripting for automation
- Excellent communication and stakeholder management skills
- Ability to work independently and within a distributed team environment
- Detail-oriented with strong problem-solving and organizational skills
- English proficiency at B2 (Upper-Intermediate) level or higher
Nice to have
- Experience with AWS EKS and Kubernetes
- Familiarity with MLOps practices and pipeline automation
- Knowledge of attribute-based access control and advanced data governance concepts
- Experience with Secrets management and SSO/SCIM provisioning
- Certification in AWS or Databricks platform engineering