Senior Site Reliability Engineer (AWS)
Amazon Web Services
& 4 others
Choose an option
Are you a talented Site Reliability Engineer (SRE) passionate about building scalable, efficient, and reliable cloud systems?
Join our team of innovative professionals who are shaping the future of cloud infrastructure and delivering world-class solutions. If you're seeking a challenging role where your technical expertise and problem-solving abilities can make a real impact, we’d love to hear from you!
Responsibilities
- Design, develop, and maintain scalable cloud infrastructure solutions using AWS technologies and AWS CDK
- Collaborate with development and operations teams to ensure efficient delivery of applications, enhance deployments, and improve overall system reliability
- Enhance server-side code using TypeScript to support application functionality and scalability
- Implement best practices for CI/CD pipelines to accelerate development and deployment processes
- Respond to operational issues and incidents, troubleshoot effectively, and ensure high availability and resilience of production systems
- Drive the implementation of observability practices, including monitoring, logging, and alerting, to proactively identify and resolve system issues
- Support operational systems, ensuring optimized performance and seamless scalability
- Foster collaboration and share knowledge across teams to uphold a robust culture of DevOps and SRE
Requirements
- Proven experience as a Backend Engineer or Site Reliability Engineer
- Deep understanding and practical experience with AWS services and infrastructure
- Expertise in AWS Cloud Development Kit (AWS CDK) for infrastructure as code
- Proficiency in TypeScript and its application in cloud-based systems
- Strong knowledge and experience with operational support in a cloud environment
- Excellent communication and interpersonal skills, with a focus on collaboration
Nice to have
- Knowledge of and experience with CI/CD pipelines
- Practical experience with DevOps practices, tools, and methodologies
- Familiarity with Site Reliability Engineering (SRE) principles
- Experience in observability practices, including monitoring and alerting tools such as Datadog, Prometheus, Grafana, or equivalent