Lead Site Reliability Engineer (SRE)/DevOps
Remote in Argentina, Mexico
Site Reliability Engineering& 4 others
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a jobChoose an option
We are building a resilient cloud platform and need a Lead Site Reliability Engineer (SRE)/DevOps to drive stability, scale, and operational excellence. You will blend software engineering with systems expertise to run large, distributed, fault-tolerant services and strengthen reliability practices across teams. Apply now to help raise availability, performance, and automation across production
Responsibilities
- Design, build and maintain infrastructure and tooling that enables fast software development and reliable releases
- Ensure continuous availability, performance and scalability of production systems and services
- Implement automation tools to streamline operations and improve response to alerts and incidents
- Collaborate with the development team to enhance system reliability and optimize performance
- Create and maintain operational documentation and specifications for system builds and operating procedures
- Monitor and report on service level objectives for a given application's services
- Define key performance indicators in cooperation with business and product owners
- Promote a culture of continuous improvement, testing and automation
Requirements
- Bachelor's or Master's degree in Computer Science, Information Technology or related field
- Proven track record with 5+ years of experience in an SRE/DevOps role scaling and automating large-scale systems
- Solid understanding of cloud computing services, preferably AWS, Azure or GCP
- Hands-on experience with scripting languages such as Python and Bash and infrastructure as code tools such as Terraform and CloudFormation
- Strong skills with container orchestration tools such as Kubernetes and Docker
- Working knowledge of CI/CD pipelines and tools such as Jenkins and GitLab CI
- Practical familiarity with monitoring and alerting tools such as Prometheus, Grafana and New Relic
- Excellent leadership and communication skills
- English proficiency at B2 level or higher
