Skip To Main Content
backBack to Search

Senior DevOps Engineer

DevOps, Dynatrace, Grafana, Splunk

We are looking for an experienced Senior DevOps Engineer to join our team, focusing on incident and request management, with proficiency in tools such as Dynatrace, Grafana, and Splunk.

This role requires expertise in monitoring setup and tool administration along with the ability to manage medium complexity break/fix tickets. If you are a strategic thinker with a knack for maintaining high availability and fault tolerance in systems, we encourage you to apply.

Responsibilities
  • Develop and maintain documentation that explains best practices for logging and monitoring
  • Conduct regular audits to ensure compliance with policies and industry standards
  • Engage in cross-functional discussions to promote logging and monitoring best practices across the company
  • Manage and oversee monitoring, alerting, operability, and observability using Dynatrace, Splunk, and Grafana
  • Triage, update, and assess the urgency of tickets
  • Evaluate documentation to escalate tickets that surpass Level 2 troubleshooting capabilities
  • Create and leverage documentation for standard incidents and requests
  • Establish average time to complete tickets and create SLOs for each product request type
  • Document and review metrics and escalated tickets regularly to optimize the support process
  • Handle incidents and requests for monitoring setup and tool administration using JIRA
  • Be available for off-hours monitoring, escalation, and carry pager duty for emergencies
Requirements
  • Over 3 years of experience in DevOps or SRE roles
  • Bachelor’s degree in computer science or a related field and/or equivalent work experience
  • Strong knowledge of observability including monitoring, logging, and tracing
  • Hands-on experience with Dynatrace, Splunk, Grafana
  • Background in Azure logging and monitoring tools such as Log Analytics, Azure Monitor, App Insights
  • Capability to work both independently and as part of a team
  • Strong analytical and problem-solving skills, with proficiency in troubleshooting under pressure
  • Strategic thinker with excellent organizational and interpersonal skills
  • Flexibility to adapt quickly to new technologies
  • Exceptional communication skills and fluency in English
Nice to have
  • Experience developing and promoting a culture of operational maturity
  • Proven track record of managing high-availability, fault-tolerant, scalable systems in a production environment
  • Expertise in managing a diverse team and fostering collaboration
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn