Senior Site Reliability Engineer
Віддалений* формат співпраці з території - Україна: Київ
*Ви можете працювати віддалено з території країни (або країн), для яких відкрита ця позиція.
Site Reliability Engineering& інші навички
Не знайшли свою вакансію?
Поділіться своїм CV — ми запропонуємо роль, що відповідає вашим навичкам та досвіду.
Знайти мою роль →We are looking for a Senior Site Reliability Engineer to join our dynamic and growing team supporting the Customer Last Mile area and Order Services. In this role, you will bring deep expertise in AWS Bedrock and OpenSearch (index and performance tuning) to ensure the reliability, scalability, and performance of our critical microservices ecosystem.
Чим ви будете займатися у цій ролі
- Own production environments, including on-call coverage and major incident handling
- Lead root cause analysis and drive problem management to closure
- Define and maintain SLOs/SLIs while promoting a reliability-first mindset across teams
- Operate and optimize Kubernetes workloads in AWS (EKS/ECS)
- Manage infrastructure as code using Terraform and Ansible
- Implement and maintain monitoring, alerting, and observability solutions with Instana, CloudWatch, and ELK
- Perform log analysis, alert hygiene, and capacity planning
- Support reliability patterns for CLM microservices, including APIs and async/event-driven processing
- Tune and maintain AWS Bedrock and OpenSearch indexes for optimal performance
- Apply secure-by-design principles across all infrastructure and services
- Drive automation-first practices, documentation, and cross-team collaboration
- Participate in the on-call support rotation, covering one calendar week approximately once per month
Навички
- 3+ years of experience in Site Reliability Engineering or related operations roles
- Expertise in AWS Bedrock and OpenSearch with a focus on index and performance tuning
- Proficiency in AWS fundamentals, including EC2, EKS/ECS and IAM/networking
- Background in Kubernetes operations at production scale
- Skills in infrastructure as code with Terraform
- Competency in observability tooling such as Instana, CloudWatch, and ELK
- Understanding of microservices reliability patterns, APIs, and async/event-driven processing
- Knowledge of SLO/SLI definition, RCA methodologies, and problem management practices
- Familiarity with secure-by-design principles and operational security
- Capability to handle production ownership, on-call duties, and major incident response
- Strong collaboration, documentation, and automation-first mindset
- English proficiency at a B2 level to ensure effective communication and documentation
Буде перевагою
- Flexibility to use Ansible for configuration management
- Showcase of advanced capacity planning and alert hygiene practices
- Qualifications in tuning large-scale search and AI/ML platform workloads