Skip To Main Content
backBack to Search

Senior SRE Engineer

Office in India: Pune
Site Reliability Engineering
& 5 others
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

We are seeking a Senior SRE Engineer to join our team and drive the reliability, scalability, and performance of our systems. If you are passionate about ensuring production excellence within complex, large-scale environments, this role is for you.

Responsibilities
  • Apply SRE principles, including SLI, SLO, and error budget management, to enhance service reliability and availability
  • Define meaningful metrics and alerts using monitoring and observability tools like Dynatrace and Splunk
  • Manage and improve production environments using Kubernetes, Terraform, and database technologies (SQL/NoSQL)
  • Develop automation scripts in Shell, Python, or Bash for operational efficiencies
  • Lead incident management processes, conduct root cause analyses, and implement actionable postmortem improvements
  • Maintain and optimize CI/CD pipelines using tools like Jenkins, Bamboo, or Concourse, aligning with DevOps standards
  • Collaborate across cross-functional teams to ensure system reliability and prompt resolution of complex issues
  • Enhance the scalability, performance, and reliability of distributed systems through innovative engineering solutions
  • Apply software engineering concepts to support large-scale production environments
Requirements
  • 5-10 years of experience in Site Reliability Engineering or a related field
  • Strong understanding of SRE principles and practices, including SLI, SLO, and error budget management
  • Proficiency with monitoring tools like Dynatrace and observability platforms like Splunk
  • Expertise in Kubernetes, Terraform, and database technologies (SQL/NoSQL) in production environments
  • Proficiency in scripting languages such as Shell, Python, or Bash for automation
  • Strong knowledge of CI/CD tools like Jenkins, Bamboo, or Concourse, combined with DevOps best practices
  • Experience with incident management, automated root cause analysis, and leading postmortems
  • Familiarity with software engineering concepts, system design, and distributed systems at scale
  • Capable of defining and implementing system reliability improvements across diverse technical stacks
Nice to have
  • Degree in a technical-related field or equivalent practical experience
  • Familiarity with Java-based applications, Bitbucket, Maven, and Jenkins
  • Experience with performance tuning and optimization of cloud-native Kubernetes applications
  • Proficiency in large-scale Infrastructure as Code implementations using Terraform
  • Background in managing SLAs, SLOs, SLIs, and error budgets for production systems
  • Understanding of chaos engineering, resilience testing, and advanced reliability practices