We are looking for a highly skilled and dedicated Senior Site Reliability Engineer to join our team and drive the reliability, scalability, and performance of our infrastructure and products. This role requires a deep understanding of cloud environments, expertise in troubleshooting complex systems, and a passion for optimizing operational processes to ensure seamless service delivery.
responsibilities
Provide L3 on-call support as needed
Define and implement SLI/SLO monitoring standards
Conduct detailed root cause analyses for incidents and devise preventive measures
Design and develop infrastructure and product monitoring systems
Lead postmortem processes and incident response drills
Analyze and enhance product performance and scalability
Automate recurring operational tasks to boost efficiency
Implement CI/CD pipelines using infrastructure-as-code principles
Manage cloud infrastructure and configurations using tools like Terraform and Ansible
Collaborate closely with cross-functional teams to align on operational goals and business needs
requirements
3+ years of experience in Site Reliability Engineering, DevOps, or a related role
Expertise in scripting languages such as Python, Go, Bash, or PowerShell
Proficiency in cloud infrastructure technologies including GCP, Azure, AWS, or Terraform
Strong skills in monitoring and observability tools such as DataDog, Prometheus, or Grafana
In-depth knowledge of CI/CD platforms like Jenkins, Gitlab-CI, or Azure DevOps
Solid understanding of configuration management tools such as Ansible
Competency in containerization technologies, including Docker and Kubernetes
Exceptional problem-solving abilities, troubleshooting skills, and attention to detail
Ability to reconstruct incident conditions using robust root cause analysis approaches
nice to have
Familiarity with end-to-end observability stacks like ELK, Dynatrace, or Zabbix
Background in Groovy SDK and Jenkinsfile scripting
Experience designing scalable and fault-tolerant cloud-native architectures
Understanding of network performance optimization in cloud environments
We are looking for a highly skilled and dedicated Senior Site Reliability Engineer to join our team and drive the reliability, scalability, and performance of our infrastructure and products. This role requires a deep understanding of cloud environments, expertise in troubleshooting complex systems, and a passion for optimizing operational processes to ensure seamless service delivery.
responsibilities
Provide L3 on-call support as needed
Define and implement SLI/SLO monitoring standards
Conduct detailed root cause analyses for incidents and devise preventive measures
Design and develop infrastructure and product monitoring systems
Lead postmortem processes and incident response drills
Analyze and enhance product performance and scalability
Automate recurring operational tasks to boost efficiency
Implement CI/CD pipelines using infrastructure-as-code principles
Manage cloud infrastructure and configurations using tools like Terraform and Ansible
Collaborate closely with cross-functional teams to align on operational goals and business needs
requirements
3+ years of experience in Site Reliability Engineering, DevOps, or a related role
Expertise in scripting languages such as Python, Go, Bash, or PowerShell
Proficiency in cloud infrastructure technologies including GCP, Azure, AWS, or Terraform
Strong skills in monitoring and observability tools such as DataDog, Prometheus, or Grafana
In-depth knowledge of CI/CD platforms like Jenkins, Gitlab-CI, or Azure DevOps
Solid understanding of configuration management tools such as Ansible
Competency in containerization technologies, including Docker and Kubernetes
Exceptional problem-solving abilities, troubleshooting skills, and attention to detail
Ability to reconstruct incident conditions using robust root cause analysis approaches
nice to have
Familiarity with end-to-end observability stacks like ELK, Dynatrace, or Zabbix
Background in Groovy SDK and Jenkinsfile scripting
Experience designing scalable and fault-tolerant cloud-native architectures
Understanding of network performance optimization in cloud environments
Let us find a perfect job for you
Share your CV and pass our review to get a personalized job offer even if you didn't find a job on the site.