Middle Site Reliability Engineer
Remote in Colombia
Site Reliability Engineering
& 11 others

Sorry, this position is no longer available
Colombia
We're on the lookout for an exceptionally skilled Middle Site Reliability Engineer to become an integral part of our remote team, contributing to a captivating project that leverages cutting-edge technologies and tools.
As a Middle Site Reliability Engineer, your role involves deploying resources using Terraform and IaC, crafting innovative observability and monitoring features, programming application monitoring and alerting systems, executing stress tests, and automating manual procedures within the CI/CD pipelines.
Responsibilities
- Craft, adapt, and resolve modules for deploying resources using Terraform and IaC
- Tackle assigned Stories in Azure DevOps following agile methodologies
- Generate novel observability features and compelling visualizations
- Code applications monitoring and alerting mechanisms
- Execute stress tests and ensure system resilience
- Streamline manual processes within CI/CD pipelines
- Establish Playbooks if absent and automate them
- Implement automated alerts for Auto Healing, as per each Program's specifications
- Engage collaboratively with cross-functional teams to deliver top-tier software solutions aligned with project objectives and deadlines
- Continuously assess industry trends and best practices to enhance and implement potent Site Reliability strategies
Requirements
- A minimum of 2 years of hands-on experience as a Site Reliability Engineer, specializing in extensive projects and intricate infrastructures
- Proficiency in Azure and Microsoft Azure services, demonstrating expertise in cloud infrastructure design, deployment, and management
- Advanced proficiency with Kubernetes and a solid grasp of Helm
- Expertise in Azure DevOps as the primary CI/CD tool, emphasizing automation and efficiency
- Competence with Terraform, ARM, and IaC, ensuring streamlined and scalable infrastructure management
- Familiarity with Linux and scripting languages (bash/PowerShell) for automation purposes
- Familiarity with Prometheus and Grafana to guarantee optimal system performance and reliability
- Understanding of SLI/SLO concepts for efficient monitoring and alerting
- Upper-intermediate proficiency in English, facilitating effective written and verbal communication and collaboration with the team and stakeholders
Nice to have
- Working knowledge of Golang and Angular
- Experience with Google Cloud/OpenShift
- Proficiency in the Python scripting language for automation purposes
- Knowledge of Jaeger, Kiali, and Loki for effective monitoring and observability
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn