Skip To Main Content
backBack to Search

Site Reliability Engineer

Remote in Argentina,
& 4 others
Microsoft Azure
& 9 others

We are seeking a talented and proactive Site Reliability Engineer to join our Cloud Security and Infrastructure (CSI) team.

This role involves ensuring the reliability, scalability, and performance of applications and infrastructure, with opportunities to work with cutting-edge containerization and cloud technologies.

Join our team and play a critical role in building and maintaining secure, observable, and scalable systems!

Responsibilities
  • Create and manage applications, containerize them using tools like Docker or Podman, and troubleshoot logs to trace events
  • Develop and deploy Kubernetes resource manifests into clusters such as Kind, GKE, or AKS
  • Set up and configure Prometheus agents for monitoring infrastructure and application behavior, while defining alerts based on metrics
  • Collaborate with teams to maintain robust CI/CD pipelines using Azure DevOps or GitOps frameworks like Helm and ArgoCD
  • Ensure the reliability and scalability of distributed systems by monitoring, debugging, and optimizing system performance
  • Utilize infrastructure-as-code tools, such as Terraform, to manage cloud environments
Requirements
  • 2+ years of hands-on programming experience paired with proficiency in at least one scripting language
  • Proficiency in Microsoft Azure
  • Competency in Kubernetes and Linux
  • Knowledge of observability principles and familiarity with tools such as Prometheus for monitoring applications
  • Experience with Azure DevOps CI/CD pipelines and/or GitOps workflows using Helm or ArgoCD
  • Background in using Terraform to design and manage cloud infrastructure
  • English level B1+ for effective communication
Nice to have
  • Showcase of Azure DevOps experience
  • Experience with Google Cloud Platform
  • Familiarity with Prometheus and related observability tools
  • Background in service mesh tools, particularly Istio
We offer/Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn