We are seeking a highly skilled and motivated Senior Site Reliability Engineer to be a key member of our team, driving operational excellence and improving the reliability, scalability, and performance of our infrastructure and product services.
Responsibilities
- Provide L3 on-call support, ensuring rapid response to incidents
- Define and implement effective SLI/SLO metrics for product monitoring
- Perform detailed root cause analysis to resolve critical issues
- Conduct postmortems and organize drills to improve readiness
- Analyze product performance, scalability, and reliability to optimize service delivery
- Automate operational tasks to reduce manual intervention
- Implement CI/CD pipelines using tools like Jenkins, Gitlab-CI, or Azure DevOps
- Manage cloud infrastructure and configurations to support Infrastructure-as-Code initiatives
- Utilize configuration management tools such as Ansible to maintain consistency across environments
- Collaborate closely with cross-product teams and business stakeholders to align reliability goals with project objectives
Requirements
- 5+ years of experience working in Site Reliability Engineering or similar roles
- Intermediate knowledge of scripting languages such as Python, Go, Bash, or Powershell
- Solid knowledge of cloud platforms, including AWS, Azure, or GCP
- Familiarity with observability tools such as Prometheus, Grafana, DataDog, ELK, or Zabbix
- Expertise in cloud infrastructure management tools, including Terraform and one of the cloud CLIs (gcloud, az, aws)
- Proficiency in containerization technologies like Docker and Kubernetes (K8s)
- Capability to define and monitor SLI/SLO metrics for system reliability
- Thorough understanding of postmortem and drill procedures to enhance incident handling processes
- B2-level English proficiency in both speaking and writing
Nice to have
- Showcase of implementing CI/CD pipelines using Groovy SDK or Jenkinsfile
- Background in working with large-scale production systems requiring high availability
- Familiarity with advanced monitoring practices using tools such as Dynatrace
- Skills in scaling Kubernetes clusters and optimizing containerized applications
- Flexibility to use diverse scripting languages to automate complex workflows
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job