We are looking for a highly skilled and dedicated Senior Site Reliability Engineer to join our team and drive the reliability, scalability, and performance of our infrastructure and products.
This role requires a deep understanding of cloud environments, expertise in troubleshooting complex systems, and a passion for optimizing operational processes to ensure seamless service delivery.
Responsibilities
- Provide L3 on-call support as needed
- Define and implement SLI/SLO monitoring standards
- Conduct detailed root cause analyses for incidents and devise preventive measures
- Design and develop infrastructure and product monitoring systems
- Lead postmortem processes and incident response drills
- Analyze and enhance product performance and scalability
- Automate recurring operational tasks to boost efficiency
- Implement CI/CD pipelines using infrastructure-as-code principles
- Manage cloud infrastructure and configurations using tools like Terraform and Ansible
- Collaborate closely with cross-functional teams to align on operational goals and business needs
Requirements
- 3+ years of experience in Site Reliability Engineering, DevOps, or a related role
- Expertise in scripting languages such as Python, Go, Bash, or PowerShell
- Proficiency in cloud infrastructure technologies including GCP, Azure, AWS, or Terraform
- Strong skills in monitoring and observability tools such as DataDog, Prometheus, or Grafana
- In-depth knowledge of CI/CD platforms like Jenkins, Gitlab-CI, or Azure DevOps
- Solid understanding of configuration management tools such as Ansible
- Competency in containerization technologies, including Docker and Kubernetes
- Exceptional problem-solving abilities, troubleshooting skills, and attention to detail
- Ability to reconstruct incident conditions using robust root cause analysis approaches
Nice to have
- Familiarity with end-to-end observability stacks like ELK, Dynatrace, or Zabbix
- Background in Groovy SDK and Jenkinsfile scripting
- Experience designing scalable and fault-tolerant cloud-native architectures
- Understanding of network performance optimization in cloud environments
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job