Back to Search
Senior Systems Engineer
DevOps, GitHub Actions, Grafana, Istio, Kubernetes, Python, Terragrunt, Amazon Web Services, Prometheus, Terraform
Sorry, this position is no longer available
We are currently seeking a Senior DevOps Engineer to join our remote team.
The successful candidate will play a crucial role in building and maintaining the CVML platform. Your expertise in DevOps, particularly with GitHub Actions, Grafana, Istio, Kubernetes, Python, Terraform, and Amazon Web Services, will be instrumental in ensuring the reliability and efficiency of our systems.
If you are passionate about automation, have a strong command over modern DevOps tools and practices, and enjoy solving complex challenges in a collaborative environment, this opportunity is tailored for you.
Responsibilities
- Develop Terraform and Terragrunt configurations for infrastructure as code
- Create and manage GitHub Actions workflows for CI/CD pipelines
- Troubleshoot data access permission issues in AWS S3 and AWS IAM
- Troubleshoot Kubeflow ML pipeline issues related to CPU, Memory, GPU, and Permissions
- Develop scripts using Python for platform automation tasks
- Collaborate with the team to enhance the reliability and efficiency of the CVML platform
- Participate in architecture and design discussions for system improvements
- Stay updated with the latest DevOps tools and practices for continuous improvement
Requirements
- Minimum of 3 years of practical experience in DevOps roles
- In-depth knowledge of Kubernetes and its ecosystem, particularly AWS EKS and KubeSpray
- Proficiency in Terraform and Terragrunt for infrastructure as code
- Experience in using Prometheus and Grafana for monitoring and observability
- Solid understanding of Istio for service mesh and its basic components, such as sidecars, mTLS, and ingress gateway
- Proficiency in Python for scripting and automation tasks
- Hands-on experience with GitHub and GitHub Actions for CI/CD pipelines
- Strong understanding of AWS services including network, LoadBalancer, and IAM
- Excellent troubleshooting skills for data access permission issues in AWS S3 and AWS IAM
- Ability to develop and troubleshoot Kubeflow ML pipeline issues
Nice to have
- Familiarity with distributed tracing tools such as Zipkin and Istio
- Knowledge of Golang, Kubeflow, and Pulumi
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn