Senior DevOps/Site Reliability Engineer

Site Reliability Engineering

We’re seeking a skilled DevOps/SRE with extensive expertise in designing, implementing, and maintaining observability platforms to ensure system reliability, performance, and scalability. As a vital member of our SRE team, you will promote the adoption of observability best practices, fostering proactive monitoring, swift incident resolution, and continuous enhancements to our software products and infrastructure.

This role emphasizes creating and refining observability solutions—including metrics, logs, and traces—to provide actionable insights into system health and performance. You'll also advance automation for deployment pipelines, oversee applications across various environments, and ensure our systems meet rigorous reliability and availability expectations. Collaboration will be essential as you engage closely with development teams to integrate observability into the software lifecycle, equipping them with the tools and practices for efficient debugging and iteration.

Responsibilities

Architect and implement observability platforms using tools like Prometheus, Grafana, and OpenTelemetry to support our Next.js frontend and accompanying systems
Design and maintain automated deployment pipelines focused on reliability, observability, and zero-downtime updates across multiple environments
Collaborate with development teams to integrate observability into local workflows for accelerated debugging and iteration
Optimize infrastructure and tools for scalability, fault tolerance, and performance with the aim of reducing mean time to detection (MTTD) and resolution (MTTR)
Mentor team members in SRE practices, including observability-driven development, incident management, and post-mortem analyses

Requirements

Proficiency in scripting languages like Python for automation and observability tools
Expertise in observability frameworks (e.g., Prometheus, Grafana, Loki, Jaeger) and logging solutions (e.g., ELK stack, Fluentd)
Background in containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes, AWS ECS)
Knowledge of infrastructure as code tools (e.g., Terraform, Ansible) to provision and manage observable systems
Familiarity with version control systems, especially Git, and integrating observability into CI/CD pipelines (e.g., Jenkins, GitHub Actions)
Capability to define and measure service-level indicators (SLIs), objectives (SLOs), and error budgets to ensure system reliability
Competency in fostering collaboration and communication, with a strong commitment to nurturing a blameless culture of improvement

Nice to have

Proficiency in Polish language
Proficiency in programming languages as applied to SRE, DEVOPS, or observability contexts
Familiarity with cloud platforms, such as AWS, with a focus on observability services (e.g., CloudWatch, X-Ray)
Understanding of distributed systems, chaos engineering, or security practices in observable environments

Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job