Skip To Main Content
backBack to Search

Senior Site Reliability Engineer (SRE)

Remote in Argentina, Mexico
Site Reliability Engineering
& 14 others

We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.

As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.

Responsibilities
  • Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
  • Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
  • Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
  • Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
  • Optimize observability by implementing monitoring, logging, and tracing solutions
  • Maintain and manage CI/CD processes within Azure DevOps or similar environments
  • Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
  • Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively
Requirements
  • Minimum 3+ years of programming experience, preferably in GoLang
  • Hands-on experience with at least one scripting language (e.g., Bash or Python)
  • Proficiency with Kubernetes, with at least 3 years of practical expertise
  • Fundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platforms
  • Skills in configuring and managing CI/CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deployment
  • Background in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)
  • Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectively
Nice to have
  • Familiarity with multiple cloud providers, including AWS and GCP alongside Azure
  • Expertise in GitOps packaging and deployment tools like Argo CD and Helm
  • Understanding of service meshes like Istio for Kubernetes-based microservices architectures
  • Competency in infrastructure-as-code tools such as Terraform
  • Background in software development with experience across multiple domains
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn