Senior Site Reliability Engineer
Remote in Mexico
Site Reliability Engineering
& 10 others
Mexico
We are seeking a highly skilled Senior Site Reliability Engineer to join our remote team, working on a distributed system project that requires expertise in a broad range of tools and skills. As a Senior Site Reliability Engineer, you will be responsible for analyzing and discovering how all components of the system work together, ensuring reliability and performance. You will work closely with cross-functional teams to design, build, and maintain the infrastructure and services that support the system. If you are passionate about site reliability engineering and have a proven track record of success, we invite you to be part of our team.
Responsibilities
- Design, build, and maintain infrastructure and services that support the distributed system
- Monitor system performance and troubleshoot issues, ensuring reliability and availability
- Collaborate with cross-functional teams to design and implement solutions that meet business and user needs
- Automate infrastructure deployment and configuration, streamlining processes and increasing efficiency
- Participate in code reviews and contribute to the development of best practices for site reliability engineering
- Develop and maintain documentation for infrastructure and services, ensuring knowledge transfer and team alignment
- Stay up-to-date with emerging technologies and trends in site reliability engineering, continuously improving skills and knowledge
Requirements
- A minimum of 3 years of experience in Site Reliability Engineering, demonstrating expertise in designing, building, and maintaining large-scale distributed systems
- Proficiency in containerization technologies such as Docker and Kubernetes, enabling you to deploy and manage services in a scalable and reliable manner
- Hands-on experience with monitoring and logging tools such as Grafana
- Experience with cloud platforms such as Microsoft Azure and Google Cloud Platform, enabling you to design and deploy infrastructure in the cloud
- Strong scripting skills in PowerShell, Python, and Terraform, allowing you to automate infrastructure deployment and configuration
- Experience with web technologies such as PHP and Angular, enabling you to develop and maintain web applications
- Excellent communication and collaboration skills, allowing you to work effectively with cross-functional teams
- Autonomous and able to make decisions, showing your ability to take ownership and drive projects forward
- Fluent spoken and written English at an Upper-Intermediate level or higher, enabling effective communication
Nice to have
- Knowledge of JavaScript and Go language
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn