Skip To Main Content
backBack to Search

Senior Site Reliability Engineer

Remote in Mexico
Site Reliability Engineering
& 10 others

We are seeking a highly skilled Senior Site Reliability Engineer to join our remote team, working on a distributed system project that requires expertise in a broad range of tools and skills. As a Senior Site Reliability Engineer, you will be responsible for analyzing and discovering how all components of the system work together, ensuring reliability and performance. You will work closely with cross-functional teams to design, build, and maintain the infrastructure and services that support the system. If you are passionate about site reliability engineering and have a proven track record of success, we invite you to be part of our team.

Responsibilities
  • Design, build, and maintain infrastructure and services that support the distributed system
  • Monitor system performance and troubleshoot issues, ensuring reliability and availability
  • Collaborate with cross-functional teams to design and implement solutions that meet business and user needs
  • Automate infrastructure deployment and configuration, streamlining processes and increasing efficiency
  • Participate in code reviews and contribute to the development of best practices for site reliability engineering
  • Develop and maintain documentation for infrastructure and services, ensuring knowledge transfer and team alignment
  • Stay up-to-date with emerging technologies and trends in site reliability engineering, continuously improving skills and knowledge
Requirements
  • A minimum of 3 years of experience in Site Reliability Engineering, demonstrating expertise in designing, building, and maintaining large-scale distributed systems
  • Proficiency in containerization technologies such as Docker and Kubernetes, enabling you to deploy and manage services in a scalable and reliable manner
  • Hands-on experience with monitoring and logging tools such as Grafana
  • Experience with cloud platforms such as Microsoft Azure and Google Cloud Platform, enabling you to design and deploy infrastructure in the cloud
  • Strong scripting skills in PowerShell, Python, and Terraform, allowing you to automate infrastructure deployment and configuration
  • Experience with web technologies such as PHP and Angular, enabling you to develop and maintain web applications
  • Excellent communication and collaboration skills, allowing you to work effectively with cross-functional teams
  • Autonomous and able to make decisions, showing your ability to take ownership and drive projects forward
  • Fluent spoken and written English at an Upper-Intermediate level or higher, enabling effective communication
Nice to have
  • Knowledge of JavaScript and Go language
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn