Mexico
Join our team as a Senior DevOps Engineer, where you will play a crucial role in incident and request management using tools such as Dynatrace, Grafana, and Splunk. You will be responsible for monitoring setup, tool administration, and addressing medium complexity tickets. If you are passionate about optimizing operational processes, we encourage you to apply.
Responsibilities
- Develop and maintain documentation that outlines best practices for logging and monitoring
- Conduct regular audits of logging and monitoring practices to ensure compliance with standards
- Participate in cross-functional discussions regarding logging and monitoring best practices
- Manage monitoring, alerting, operability, and observability using Dynatrace, Splunk, and Grafana
- Triage tickets and update details while assessing urgency
- Review documentation to escalate tickets beyond Level 2 troubleshooting
- Provide warm handoff notes for escalated tickets
- Leverage and create documentation for standard incidents and requests
- Define average completion time per ticket and establish service level objectives
- Review and present metrics and escalated tickets to shift support processes left
- Manage incidents and requests for monitoring setup using JIRA
- Be available for monitoring and escalation during off-hours and weekends
- Carry pager duty for emergencies after hours
- Review and present metrics regularly to document and enhance support processes
Requirements
- Bachelor's degree in computer science or a related field
- 3+ years of experience working within DevOps or Site Reliability Engineering teams
- Strong knowledge of observability including monitoring, logging, and tracing
- Experience with Dynatrace, Splunk, Grafana, and other monitoring tools
- Experience with Azure logging and monitoring tools such as Log Analytics and Azure Monitor
- Experience in operating high-availability and fault-tolerant software in production
- Fluency in English (B2+)
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn