Senior Site Reliability Engineer
Remote in Mexico
Site Reliability Engineering
& 10 others
Mexico
We are seeking a highly skilled Senior Site Reliability Engineer to join our fully remote team, contributing to a distributed system project that demands expertise across a diverse set of tools and frameworks.
You will ensure system reliability and performance by examining how all components operate as a cohesive unit. If you are passionate about creating reliable systems and have a proven record of delivering results, we welcome your application.
Responsibilities
- Ensure the design, build, and maintenance of infrastructure and services that support the distributed system
- Monitor system performance to identify and resolve issues, maintaining high availability and reliability
- Collaborate with cross-functional teams to craft and implement solutions that align with business and user requirements
- Automate infrastructure deployment and configuration to enhance process efficiency and reliability
- Conduct code reviews to uphold and promote best practices for site reliability engineering
- Document infrastructure and services to ensure effective knowledge sharing and alignment within the team
- Stay informed about emerging technologies and trends in site reliability engineering to refine skills and inform innovation
Requirements
- A minimum of 3 years’ experience in Site Reliability Engineering with demonstrated success in managing large-scale distributed systems
- Proficiency in containerization technologies like Docker and Kubernetes to deploy and manage scalable and reliable services
- Competency in monitoring and logging tools including Grafana to maintain observability of systems
- Background in cloud platforms such as Microsoft Azure and Google Cloud Platform to design and implement cloud-based infrastructure
- Skills in scripting with PowerShell, Python, and Terraform for automation of deployments and configurations
- Familiarity with web technologies such as PHP and Angular to support and maintain web applications
- Strong communication skills and the ability to effectively collaborate across teams
- Autonomy in decision-making and driving project outcomes, demonstrating accountability and initiative
- Upper-Intermediate or higher fluency in spoken and written English for seamless interaction within a global team
Nice to have
- Knowledge of JavaScript with the flexibility to use it as needed
- Understanding of Go language and its applicability in modern development contexts
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn