Senior Site Reliability Engineer (SRE)
Argentina
We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.
As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.
Responsibilities
- Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
- Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
- Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
- Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
- Optimize observability by implementing monitoring, logging, and tracing solutions
- Maintain and manage CI/CD processes within Azure DevOps or similar environments
- Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
- Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively
Requirements
- Minimum 3+ years of programming experience, preferably in GoLang
- Hands-on experience with at least one scripting language (e.g., Bash or Python)
- Proficiency with Kubernetes, with at least 3 years of practical expertise
- Fundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platforms
- Skills in configuring and managing CI/CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deployment
- Background in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)
- Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectively
Nice to have
- Familiarity with multiple cloud providers, including AWS and GCP alongside Azure
- Expertise in GitOps packaging and deployment tools like Argo CD and Helm
- Understanding of service meshes like Istio for Kubernetes-based microservices architectures
- Competency in infrastructure-as-code tools such as Terraform
- Background in software development with experience across multiple domains
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn