Skip To Main Content
backBack to Search

Senior Site Reliability Engineer

Remote in Colombia
Site Reliability Engineering
& 11 others
warning.png
Sorry, this position is no longer available

We're on the lookout for a highly talented Senior Site Reliability Engineer to join our remote team and engage in thrilling initiatives leveraging state-of-the-art technologies.

As a Senior SRE, your role involves crafting, adjusting, and resolving modules for resource deployment through Terraform and Infrastructure as Code paradigms. Dive into stories designated in Azure DevOps within agile workflows, innovate observability and monitoring features, and script application monitoring alongside alerting. Conduct stress tests, streamline manual processes in CI/CD pipelines, and devise as well as automate Playbooks and Alerts to facilitate Auto Healing.

Responsibilities
  • Devise, tweak, and troubleshoot modules for resource deployment utilizing Terraform and Infrastructure as Code principles
  • Tackle assigned Stories within Azure DevOps following agile methodologies
  • Foster novel observability and monitoring functionalities and visualizations
  • Script application monitoring and set up alerting mechanisms
  • Conduct and execute stress tests
  • Automate manual procedures within CI/CD pipelines
  • Inaugurate and automate Playbooks when absent
  • Automate Alerts to enable Auto Healing as specified by each program
  • Collaborate with interdisciplinary teams to furnish top-notch software solutions aligned with project objectives and timelines
  • Ensure the establishment and sustenance of infrastructures leveraging Infrastructure as Code principles and tools
  • Provide guidance and mentorship to junior team members, cultivating a culture of growth and continuous learning within the team
Requirements
  • Minimum of 3 years immersed in Site Reliability Engineering, steering intricate cloud and microservices ecosystems
  • Adept in Azure DevOps as the primary CI/CD tool
  • Proficiency in Kubernetes, coupled with a sound grasp of Helm, Istio, and Google Cloud Platform
  • Competence in Terraform, ARM, and Infrastructure as Code principles for streamlined and scalable infrastructure administration
  • Robust comprehension of Linux OS and proficiency in scripting languages such as Bash and PowerShell
  • Familiarity with Observability toolsets like Prometheus and Grafana, complemented by a grasp of SLI/SLO concepts
  • Substantial experience in at least one programming language, be it Go or Python
  • Exceptional problem-solving and analytical abilities, facilitating effective decision-making in intricate settings
  • Advanced proficiency in the English language (Upper-Intermediate level) for seamless communication and collaboration with the team and stakeholders
Nice to have
  • Working familiarity with Golang and Angular for efficient application development
  • Hands-on experience with Google Cloud/OpenShift for cloud infrastructure design, deployment, and management
  • Knowledge of Jaeger, Kiali, and Loki for efficient observability and monitoring
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

These jobs are for you