Back to Search
Senior Site Reliability Engineer
Site Reliability Engineering, CA Release Automation, Certificate authority, IBM AIX, Firewalls, Load Balancing Tools, REST API, RedHat Satellite, Requirements and Change management, Test planning & reporting, VM
We are seeking a highly experienced Senior Site Reliability Engineer to join our remote team, working on a complex and challenging project that involves developing and maintaining highly resilient applications. As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, availability, and performance of our applications, collaborating with Development and Operations teams to align on requirements and driving SDLC capabilities and limitations. You will also be responsible for evaluating and implementing orchestration, automation, and tooling solutions to ensure consistent processes and repetitive tasks are performed with a higher level of accuracy and reduced defects.
Responsibilities
- Enable creation and updating of logging standards to streamline dashboard creation and ensure usability of logging repository
- Drive monitoring requirements to ensure business-service level visibility for all support teams
- Provide guidance to software engineers related to design patterns that are resistant to failure
- Communicate effectively with Development and Operation teams to align on requirements, driving SDLC requirements, capabilities, and limitations pertinent to delivering highly resilient applications
- Evaluate and implement orchestration, automation, and tooling solutions to ensure consistent processes and repetitive tasks are performed with a higher level of accuracy and reduced defects
- Build, implement and advise on recovery tooling to adhere to enterprise standards and/or frameworks
- Introduce new and impactful technologies to the production support tool chain that help minimize friction for production releases and support
- Ensure application data flows are accurate and up to date with the objective to increase the knowledge base of all support teams and drive reliability
- Facilitate the resolutions of non-application issues (3rd party upstream issues, infrastructure issues, storage, database, network, file transfer etc.)
- Participate in architectural decisions to ensure software transaction flows are appropriately supported and designed
Requirements
- A minimum of 3 years of experience in Site Reliability Engineering, demonstrating your expertise in developing and maintaining highly resilient applications
- In-depth knowledge of CA Release Automation, Certificate authority, IBM AIX, Firewalls
- Experience with Load Balancing Tools, REST API, RedHat Satellite
- Knowledge of Requirements and Change management, Test planning & reporting, VM, and other relevant tools and technologies
- Experience in driving monitoring requirements to ensure business-service level visibility for all support teams
- Strong experience in evaluating and implementing orchestration, automation, and tooling solutions
- Strong experience in availability, proactive monitoring / alerting, capacity planning, performance (reducing latency and increasing efficiency) to include testing for technical platforms
- Excellent communication skills and the ability to communicate effectively with Development and Operation teams to align on requirements and drive SDLC capabilities and limitations
- Fluent spoken and written English at an Upper-Intermediate level or higher
Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn