Skip To Main Content
backBack to Search

Senior Site Reliability Engineer

Site Reliability Engineering, Oracle DevOps Service, Incident Management (ITSM), Java Development, Ruby Development, Splunk

We are seeking a skilled and experienced Senior Site Reliability Engineer to join our team. In this role, you will be pivotal in enhancing the stability and efficiency of our cloud-based systems, ensuring they are robust and scalable to meet our business's evolving demands. You will also be at the forefront of adopting cutting-edge technologies and methodologies to drive continuous improvement in our operational landscape.

Responsibilities
  • Collaborate with cross-functional teams to design and implement cloud-based solutions that meet business needs
  • Ensure optimal performance, reliability, and scalability of our cloud infrastructure through proactive monitoring, automation, and ongoing maintenance
  • Implement and maintain CI/CD pipelines for cloud-based applications
  • Contribute to the improvement of our cloud architecture and best practices
  • Develop and enforce service level indicators (SLIs) and service level objectives (SLOs) to maintain high service standards
  • Lead root cause analysis and post-mortem assessments to prevent future incidents
  • Facilitate the automation of routine tasks to enhance system efficiencies
  • Drive the adoption of security best practices throughout the infrastructure lifecycle
Requirements
  • Bachelor's or Master's Degree in Computer Science or a related field
  • Minimum of 3 years experience as a Site Reliability Engineer
  • Proficiency in managing and deploying applications in Oracle Cloud environments
  • Extensive experience with Oracle DevOps Service for streamlined operations
  • Strong background in developing and managing microservices architecture
  • Skilled in instrumentation for monitoring and performance tracking
  • Advanced knowledge in setting up comprehensive monitoring and alerting systems
  • Expertise in creating and managing CI/CD pipelines for automated deployments
  • Competent in incident management and resolution
  • Familiarity with DevOps practices and tools to enhance operational workflows
  • Strong communication and collaboration skills
  • Fluent in English at a B2 level or higher
Nice to have
  • Experience with programming in Java
  • Proficiency in Ruby for script writing and automation tasks
  • Familiarity with using Splunk for logging and analyzing system data
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn