Site Reliability Team Lead
Remote in Latvia, Republic of Lithuania
Site Reliability Engineering& 8 others
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a jobChoose an option
We are seeking a Site Reliability Team Lead to help drive the transition from a monolithic architecture to a modular, service-oriented model. In this role, you will build out capacity for production monitoring, help the organization define SLOs/SLIs and ensure reporting of key platform metrics. Working inside the Engineering Enablement team (Platform Engineering), you will also establish best practices for production release.
Responsibilities
- Lead the design and implementation of end-to-end monitoring and alerting for Azure-based production environments
- Collaboration with development teams to improve deployment automation, infrastructure as code and CI/CD pipelines
- Management of incident response, root cause analysis and post-mortem processes
- Oversee migration mechanics, including cutover from manually-built infrastructure to IaC without a production outage
- Drive reliability best practices and advocate for SRE principles across the platform engineering organization
- Definition of SLOs/SLIs and ensure reporting of key platform metrics
- Establishment of best practices for production release
Requirements
- 5+ years of experience in DevOps or Site Reliability Engineering
- Proven experience with Azure cloud services, monitoring tools and infrastructure automation
- Hands-on expertise in Azure, Terraform and Azure Pipelines
- Proficiency in App Services, Docker and K8S
- Skills in Azure DevOps, Grafana and Application Insights
- Knowledge of networking, Key Vault and Cosmos DB
- Familiarity with Identity and Azure Monitor
- Strong background in DevOps practices, CI/CD and infrastructure as code such as ARM templates
- Excellent troubleshooting, communication and collaboration skills
- English proficiency at B2 level or higher
