Skip To Main Content
backBack to Search

Lead MLOps Engineer

Hybrid in Mexico
Data DevOps
& 14 others

We are seeking a highly skilled and experienced Lead MLOps Engineer to join our Enterprise AI Products and Technology Team. The ideal candidate will have extensive industry experience delivering Machine Learning or Data Science projects at scale, coupled with proven leadership skills.

In this role, you will mentor and guide a team of multidisciplinary engineers, working closely with data science teams to create tools, establish standards, and automate impactful tasks in the machine learning product lifecycle. Your role will focus on bridging technical and organizational gaps, enhancing platform maturity, and driving scalable solutions for enterprise-wide AI initiatives, including clinical trial data analysis, knowledge graph analytics, patient safety systems, deep learning-led medication discovery, and software as a medical device systems.

As a Lead MLOps Engineer, you bring a leadership-oriented mindset and software engineering expertise focused on scalability, automation, and agility, along with the ability to set a vision, influence stakeholders, and continuously improve processes across teams.

Responsibilities
  • Lead and mentor a team of engineers, defining technical priorities, fostering collaboration, and guiding problem-solving efforts
  • Collaborate with Data Scientists and Machine Learning Engineers to understand challenges and deliver scalable tools/platforms that streamline their workflows
  • Drive continuous improvement in Machine Learning development environments, platforms, and tools to support data science initiatives at scale
  • Work closely with governance and compliance functions (e.g., Cyber Security and Data Privacy) to design and implement secure systems that balance security and end-user productivity
  • Adapt and optimize state-of-the-art machine learning methods for modern parallel computing environments (e.g., distributed clusters, multicore SMP, and GPU technologies)
  • Champion a "production-first mindset," ensuring seamless transitions of data science projects from exploratory research to production
  • Shape strategic initiatives such as defining best practices, conducting technical reviews, and aligning solutions with long-term AI platform goals
  • Leverage expertise in container orchestration frameworks (Airflow, Argo, Kubeflow, etc.) to guide tool selection and optimization across teams
  • Collaborate with enterprise-wide stakeholders to advocate for and implement scalable infrastructure using Infrastructure as Code principles
  • Develop training programs to upskill teams on advanced MLOps practices and tools
  • Track and report performance metrics for MLOps tools, environments, and production pipelines to stakeholders and leadership
Requirements
  • BSc/MSc/Ph.D in Computer Science, Data Engineering, or a related quantitative or analytical field
  • 5+ years of experience building and delivering production-grade software with significant expertise in Python programming (similar expertise in other languages will be considered)
  • Leadership experience exceeding 1 year applicable to the role
  • Proven experience in software engineering, automation, and DevOps with a demonstrated ability to lead and deliver impactful projects
  • Extensive experience developing, deploying, and scaling production-grade machine learning products or similar enterprise-scale software systems
  • Deep understanding and practical experience with at least one container orchestration framework (Airflow, Argo, Kubeflow, etc.) and ability to mentor teams in their adoption and use
  • Substantial experience deploying and managing Machine Learning or Data Science infrastructure at scale using Infrastructure as Code (e.g., Terraform, CloudFormation, etc.)
  • Strong track record of working in Agile teams with proven leadership in cross-functional collaboration
  • Proven ability to work effectively with governance functions and adhere to internal security standards while promoting productive workflows
  • Exceptional problem-solving and communication skills, with strong stakeholder management at all levels of an organization
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn