We are seeking a skilled and experienced Lead Site Reliability Engineer to join our dynamic team, ensuring the performance, scalability, and reliability of our production systems and infrastructure.
If you're a proactive problem-solver with a strong background in monitoring, automation, and cloud technologies, we want to hear from you.
Responsibilities
- Provide L3 on-call support as needed
- Design and develop monitoring systems for infrastructure and products
- Define and implement SLI/SLOs for system reliability tracking
- Conduct thorough root cause analyses for incidents
- Lead postmortem procedures and drills for continuous improvement
- Analyze product performance, scalability, and reliability
- Automate operational tasks to enhance efficiency
- Implement and manage CI/CD pipelines following "as-Code" practices
- Oversee cloud infrastructure and configuration management using Infrastructure-as-Code principles
- Collaborate closely with cross-product teams and business stakeholders to align reliability objectives
Requirements
- 5+ years of relevant experience, including 1 year in a leadership role
- Advanced knowledge of scripting languages such as Python, Go, Bash, or Powershell
- Expertise in any major cloud platform (AWS, GCP, or Azure)
- Proficient in optimizing monitoring and logging tools like DataDog, Dynatrace, Prometheus, Grafana, Zabbix, or ELK
- Capability to manage cloud infrastructure using tools like Terraform and command-line interfaces (gcloud, az, aws)
- Competency in configuration management using Ansible
- Background in CI/CD toolchains such as Jenkins (Groovy SDK, Jenkinsfile), GitLab-CI, or Azure DevOps
- Understanding of containerization technologies such as Docker and Kubernetes
- Exceptional troubleshooting and problem-solving abilities, including reconstructing incident conditions and flows based on root cause analysis
- B2-level English proficiency, both in speaking and writing
Nice to have
- Familiarity with multiple cloud-native monitoring tools
- Showcase of leading cross-functional team collaborations
- Proficiency in advanced Kubernetes configurations
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job