Senior DevOps Engineer
Microsoft Azure, Azure API Management, Docker, Kubernetes, Prometheus, Terraform, Argo CD, Artificial intelligence, Machine Learning
We are seeking a skilled Senior DevOps Engineer who can oversee our large-scale infrastructure for high-stakes, public-facing products. Candidates should bring a wealth of hands-on experience and strategic insight to our evolving DevOps operations, driving efficiency and reliability across our systems.
Responsibilities
- Maintain and improve the stability of our site reliability engineering efforts to better serve our infrastructure needs at scale
- Develop, implement, and manage CI/CD pipelines, focusing prominently on automation and deployment frequency improvements
- Design and maintain infrastructures with cloud computing platforms like AWS, GCP, or Azure
- Utilize infrastructure-as-code tools such as Terraform and Ansible for configuration and deployment activities
- Deploy and manage containerized applications using Docker and Kubernetes
- Monitor system health and performance with tools such as Prometheus and Grafana, diagnosing and resolving issues promptly
- Scale and optimize web sockets based infrastructure to support substantial traffic loads
- Collaborate cross-functionally to ensure project requirements, deadlines, and schedules are on track
- Provide detailed documentation and system diagrams to effectively communicate system design and architecture
Requirements
- Background in Site Reliability Engineering with at least 3 years of experience, especially in production environments
- Familiarity with Python or similar OOP languages
- Proficiency in cloud computing platforms including AWS, GCP or Azure
- Expertise in implementing CI/CD processes
- Competency in containerization technologies and orchestration with Docker and Kubernetes
- Skills in monitoring tools like Prometheus and Grafana
- Outstanding problem-solving capability and an attention to detail
- Proven track record of delivering reliable, efficient, and scalable infrastructure
Nice to have
- Experience with Azure
- Experience or involvement with ML/AI projects
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn