Skip To Main Content
backBack to Search

Senior Site Reliability Engineer

Remote in Mexico
Site Reliability Engineering
& 10 others

We are seeking a highly skilled Senior Site Reliability Engineer to join our fully remote team, contributing to a distributed system project that demands expertise across a diverse set of tools and frameworks.

You will ensure system reliability and performance by examining how all components operate as a cohesive unit. If you are passionate about creating reliable systems and have a proven record of delivering results, we welcome your application.

Responsibilities
  • Ensure the design, build, and maintenance of infrastructure and services that support the distributed system
  • Monitor system performance to identify and resolve issues, maintaining high availability and reliability
  • Collaborate with cross-functional teams to craft and implement solutions that align with business and user requirements
  • Automate infrastructure deployment and configuration to enhance process efficiency and reliability
  • Conduct code reviews to uphold and promote best practices for site reliability engineering
  • Document infrastructure and services to ensure effective knowledge sharing and alignment within the team
  • Stay informed about emerging technologies and trends in site reliability engineering to refine skills and inform innovation
Requirements
  • A minimum of 3 years’ experience in Site Reliability Engineering with demonstrated success in managing large-scale distributed systems
  • Proficiency in containerization technologies like Docker and Kubernetes to deploy and manage scalable and reliable services
  • Competency in monitoring and logging tools including Grafana to maintain observability of systems
  • Background in cloud platforms such as Microsoft Azure and Google Cloud Platform to design and implement cloud-based infrastructure
  • Skills in scripting with PowerShell, Python, and Terraform for automation of deployments and configurations
  • Familiarity with web technologies such as PHP and Angular to support and maintain web applications
  • Strong communication skills and the ability to effectively collaborate across teams
  • Autonomy in decision-making and driving project outcomes, demonstrating accountability and initiative
  • Upper-Intermediate or higher fluency in spoken and written English for seamless interaction within a global team
Nice to have
  • Knowledge of JavaScript with the flexibility to use it as needed
  • Understanding of Go language and its applicability in modern development contexts
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn