Back to Search
We are seeking a highly experienced remote Senior Kafka Platform Support Engineer to join our team of experts.
In this role, you will be responsible for the installation, monitoring, troubleshooting, and maintenance of the Kafka platform, ensuring optimal performance, security, and developing new features, automation, and integration.
Responsibilities
- Install and provision new Kafka clusters and supporting components
- Monitor the health and performance of the Kafka platform and data pipelines
- Identify and fix issues related to the platform, including data pipelines, network problems, cloud or containerization resources failures, or software bugs
- Perform regular performance tuning of Kafka platform components
- Monitor and optimize the cost and performance of Kafka clusters
- Upgrade the Kafka platform to newer versions, including planning, testing, and implementation
- Manage the security of the Kafka platform, including access control lists, encryption, and regular security reviews
- Perform regular backups and disaster recovery procedures
- Manage the capacity of the Kafka platform, including projecting future growth and scaling needs
- Document procedures, configurations, and issue resolutions, and share knowledge with the team
- Work with Confluent Support for issues that cannot be resolved in-house
- Maintain and enhance onboarding automation scripts
- Support Kafka self-service automations for Topic, RBAC, Schema, Connectors management
- Implement new features released by the vendor as part of their product roadmap
- Provide support for team requests in Slack and convert to CLOUD Tickets if complexity is high
- Implement and enhance automation scripts and processes to reduce the number of tickets via self-service
Requirements
- 3+ years of relevant professional experience with Kafka
- Proven experience in the implementation and maintenance of Confluent Platform
- Strong knowledge of HELM
- Experience in automating processes and maintaining automated scripts
- Understanding of networking, and capability to coordinate with different teams including the Networking team, and CICD team, among others
- Kafka Confluent Stack experience
- Familiarity with Terraform
- HELM Kubernetes/Containers, Docker advanced experience
- Cloud GCP (compute, networking, storage, IAM) and Cloud AWS (compute, networking, storage, IAM) experience
- Knowledge of Jenkins (pipelines, groovy)
- Python/Shell scripting and automation experience
- Linux administration skills
- PagerDuty and Uptrends knowledge
- Excellent problem-solving skills and the ability to troubleshoot complex issues
- B2+ English level proficiency
Nice to have
- Relevant certifications (such as Confluent Certified Developer or Administrator for Apache Kafka)
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn