Senior Site Reliability Engineer (SRE)
Site Reliability Engineering, Amazon DynamoDB, Amazon ElastiCache, Amazon Web Services, Git, Gradle, Observability and troubleshooting in distributed systems, Apache Cassandra, Apache Kafka, Grafana, Java, Kubernetes, New Relic, Scala, Terraform

Sorry, this position is no longer available
We are seeking a Site Reliability Engineer (SRE) to become part of our remote team.
The ideal candidate will be part of a team of SREs, working closely with other units to ensure comprehensive 24/7 support for our entire customer platform. Key skills for this role include quick information processing, efficient troubleshooting of complex systems, and clear operational issue communication.
Responsibilities
- Comprehensive 24/7 on-call support provision for Java backend services and API Gateway observability
- Preparation and deployment of patches for Java code and associated service cloud infrastructure
- Establishment of first-rate metrics and dashboards for swift identification of overall platform health
- Enhancement of runbooks for all End of Service (EOS) backend services
- Monitoring of SLOs for all involved backend services with code changes submission to improve SLO as errors occur
Requirements
- Minimum of 3 years of experience in Site Reliability Engineering
- Proficiency in Amazon DynamoDB, Amazon ElastiCache, and Amazon Web Services
- Proven experience with Git, Gradle, observability, and troubleshooting in distributed systems
- Experience in providing 24/7 on-call support for backend services
- Strong competency in establishing and monitoring SLOs for backend services
- B2+ level fluency in English communication skills
Nice to have
- Experience with Docker and container orchestration tools such as Kubernetes
- Knowledge of Apache Cassandra, Apache Kafka, and Grafana
- Familiarity with Java, New Relic, Scala, and Terraform
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn