Skip To Main Content
backBack to Search

Senior Site Reliability Engineer

Remote in Colombia
Site Reliability Engineering, DevOps

We are looking for a Senior Site Reliability Engineer to uphold the reliability, scalability, and efficiency of our live production environments.

In this role, you will work closely with engineering teams to boost system robustness and delivery success while leveraging your DevOps skills in a fast-paced setting. Join us to advance your career and make a significant impact on our projects.

Responsibilities
  • Guarantee the reliability and efficiency of production environments
  • Partner with engineering teams to enhance system robustness
  • Build and sustain CI/CD pipelines and automation frameworks
  • Apply infrastructure as code principles using Terraform, CloudFormation, or equivalent
  • Oversee containerization and orchestration with Docker and Kubernetes
  • Supervise system monitoring, logging, and incident management
  • Enhance networking and Linux system performance
  • Support scalability across cloud platforms like AWS, Azure, and GCP
  • Collaborate with teams to implement DevOps best practices
  • Aid in diagnosing and resolving production incidents
  • Record system setups and operational procedures
  • Engage in on-call duties and incident response
  • Assess and deploy monitoring and alerting tools
  • Ensure adherence to security best practices in system operations
Requirements
  • Proven experience with cloud providers such as AWS, Azure, or GCP (3+ years)
  • Expertise in developing CI/CD pipelines and automation solutions
  • Hands-on skills with infrastructure as code tools like Terraform or CloudFormation
  • Experience with Docker and Kubernetes for container orchestration
  • Comprehensive knowledge of Linux, networking, monitoring, logging, and incident handling
  • Strong communication and teamwork abilities
  • Upper-Intermediate English proficiency (B2)
Nice to have
  • Understanding of SRE methodologies including SLIs, SLOs, and error budgets
  • Familiarity with scripting languages such as Python, Go, or Bash
  • Experience with observability platforms like Prometheus, Grafana, ELK, or Datadog
  • Knowledge of security best practices and DevSecOps approaches
  • Background in supporting high availability and large-scale production systems