Skip To Main Content
backBack to Search

Lead DevOps Engineer

Hybrid in Mexico
DevOps
& 3 others

Join our team as a Lead DevOps Engineer and play a vital role in incident and request management using tools like Dynatrace, Grafana, and Splunk. Take charge of monitoring setup, tool administration, and resolving medium complexity tickets. If optimizing operational processes excites you, we encourage you to take this opportunity.

Responsibilities
  • Create and maintain documentation outlining best practices for logging and monitoring
  • Perform routine audits to ensure logging and monitoring practices align with compliance standards
  • Take part in cross-functional meetings focused on logging and monitoring strategies
  • Handle monitoring, alerting, operability, and observability tasks using Dynatrace, Splunk, and Grafana
  • Triage tickets to assess urgency and update details accordingly
  • Analyze and escalate tickets beyond Level 2 troubleshooting after reviewing documentation
  • Provide clear and actionable notes for tickets that require escalation
  • Use and develop documentation that addresses standard incidents and service requests
  • Define standard completion times for tickets and set service-level objectives
  • Review and present metrics on escalated tickets to improve support processes
  • Address incidents and service requests for monitoring setup through JIRA
  • Remain available for monitoring duties and escalations during off-hours and weekends
  • Take responsibility for pager duty during emergency situations after work hours
  • Regularly analyze and present metrics to refine and advance support processes
Requirements
  • Bachelor’s degree in computer science or an equivalent field
  • 5+ years of experience in DevOps or within Site Reliability Engineering teams
  • Knowledge of observability, including monitoring, logging, and tracing
  • Expertise in tools such as Dynatrace, Splunk, and Grafana
  • Familiarity with Azure logging and monitoring tools, including Log Analytics and Azure Monitor
  • Background in managing high-availability and fault-tolerant software in production environments
  • Proficiency in English at a B2+ level
Benefits
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn