We are seeking a dedicated and experienced Senior Cloud Platform Support Engineer to join our team.
The ideal candidate will be responsible for managing and supporting cloud-based platforms, troubleshooting issues, and ensuring optimized system performance. As a key member of our IT department, this role demands deep technical expertise and the ability to collaborate with team members and stakeholders to drive operational excellence.
Responsibilities
- Collaborate with product teams to oversee Azure systems deployment, lifecycle maintenance, and capacity planning
- Handle the triage and resolution of service management system incidents and requests
- Monitor applications, manage data manipulation for widgets, generate reports, and oversee problem identification and management
- Tune agents and collectors for optimal system data manipulation
- Provide general customer support and occasional consultations with individuals across and outside of the team
- Manage Azure and AWS infrastructure, including check-in policies, installation, configuration, troubleshooting, and maintenance
- Develop scripts for automation and report generation using tools such as Terraform, Ansible, GIT, and PowerShell
- Maintain applications within environments like EKS, AKS, Dockers, and Docker Registry
- Administer networking protocols and network security measures in cloud environments
- Utilize Cloudflare products and equivalent tools effectively
- Ensure continuous monitoring of infrastructure and management of cloud components like Virtual Machines, Load Balancer, S3, and Azure Backup
Requirements
- Azure Certified Solutions Architect or Sys Ops Administrator
- 10+ years of IT industry experience, including full-stack and re-stack skills with AWS and Azure, complemented by a Bachelor’s degree in Computer Science or Information Technology
- 5+ years of experience in cloud-based development platforms, change management procedures, and troubleshooting production incidents in cloud applications
- 3+ years of experience in creating build scripts for release management using tools like Terraform, Ansible, and PowerShell, along with programming languages such as Python and Bash
- 2+ years of experience in configuring Kubernetes clusters, EKS, AKS, and tools like Helm and Prometheus
- 5+ years of Linux system administration with proficiency in Red Hat Linux or CentOS
- Flexibility to work in a 24x7 operations support environment on a rotational shift basis
- Capability to research, absorb, and organize data into actionable information for business use
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn