Skip To Main Content
backBack to Search

Senior DevOps/Site Reliability Engineer

Hybrid in Ukraine
Site Reliability Engineering
& 6 others
hot

We’re seeking a skilled DevOps/SRE with extensive expertise in designing, implementing, and maintaining observability platforms to ensure system reliability, performance, and scalability. As a vital member of our SRE team, you will promote the adoption of observability best practices, fostering proactive monitoring, swift incident resolution, and continuous enhancements to our software products and infrastructure.

This role emphasizes creating and refining observability solutions—including metrics, logs, and traces—to provide actionable insights into system health and performance. You'll also advance automation for deployment pipelines, oversee applications across various environments, and ensure our systems meet rigorous reliability and availability expectations. Collaboration will be essential as you engage closely with development teams to integrate observability into the software lifecycle, equipping them with the tools and practices for efficient debugging and iteration.

Responsibilities
  • Architect and implement observability platforms using tools like Prometheus, Grafana, and OpenTelemetry to support our Next.js frontend and accompanying systems
  • Design and maintain automated deployment pipelines focused on reliability, observability, and zero-downtime updates across multiple environments
  • Collaborate with development teams to integrate observability into local workflows for accelerated debugging and iteration
  • Optimize infrastructure and tools for scalability, fault tolerance, and performance with the aim of reducing mean time to detection (MTTD) and resolution (MTTR)
  • Mentor team members in SRE practices, including observability-driven development, incident management, and post-mortem analyses
Requirements
  • Proficiency in scripting languages like Python for automation and observability tools
  • Expertise in observability frameworks (e.g., Prometheus, Grafana, Loki, Jaeger) and logging solutions (e.g., ELK stack, Fluentd)
  • Background in containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes, AWS ECS)
  • Knowledge of infrastructure as code tools (e.g., Terraform, Ansible) to provision and manage observable systems
  • Familiarity with version control systems, especially Git, and integrating observability into CI/CD pipelines (e.g., Jenkins, GitHub Actions)
  • Capability to define and measure service-level indicators (SLIs), objectives (SLOs), and error budgets to ensure system reliability
  • Competency in fostering collaboration and communication, with a strong commitment to nurturing a blameless culture of improvement
Nice to have
  • Proficiency in Polish language
  • Proficiency in programming languages as applied to SRE, DEVOPS, or observability contexts
  • Familiarity with cloud platforms, such as AWS, with a focus on observability services (e.g., CloudWatch, X-Ray)
  • Understanding of distributed systems, chaos engineering, or security practices in observable environments
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job