Back to Search
Senior Site Reliability Engineer
Site Reliability Engineering, Oracle DevOps Service, Incident Management (ITSM), Java Development, Ruby Development, Splunk
We are seeking a skilled and experienced Senior Site Reliability Engineer to join our team. In this role, you will be pivotal in enhancing the stability and efficiency of our cloud-based systems, ensuring they are robust and scalable to meet our business's evolving demands. You will also be at the forefront of adopting cutting-edge technologies and methodologies to drive continuous improvement in our operational landscape.
Responsibilities
- Collaborate with cross-functional teams to design and implement cloud-based solutions that meet business needs
- Ensure optimal performance, reliability, and scalability of our cloud infrastructure through proactive monitoring, automation, and ongoing maintenance
- Implement and maintain CI/CD pipelines for cloud-based applications
- Contribute to the improvement of our cloud architecture and best practices
- Develop and enforce service level indicators (SLIs) and service level objectives (SLOs) to maintain high service standards
- Lead root cause analysis and post-mortem assessments to prevent future incidents
- Facilitate the automation of routine tasks to enhance system efficiencies
- Drive the adoption of security best practices throughout the infrastructure lifecycle
Requirements
- Bachelor's or Master's Degree in Computer Science or a related field
- Minimum of 3 years experience as a Site Reliability Engineer
- Proficiency in managing and deploying applications in Oracle Cloud environments
- Extensive experience with Oracle DevOps Service for streamlined operations
- Strong background in developing and managing microservices architecture
- Skilled in instrumentation for monitoring and performance tracking
- Advanced knowledge in setting up comprehensive monitoring and alerting systems
- Expertise in creating and managing CI/CD pipelines for automated deployments
- Competent in incident management and resolution
- Familiarity with DevOps practices and tools to enhance operational workflows
- Strong communication and collaboration skills
- Fluent in English at a B2 level or higher
Nice to have
- Experience with programming in Java
- Proficiency in Ruby for script writing and automation tasks
- Familiarity with using Splunk for logging and analyzing system data
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn