Skip To Main Content
backBack to Search

Lead Site Reliability Engineer - DevOps

Hybrid in Portugal: Lisbon
Site Reliability Engineering
& 6 others

We are looking for a Lead Site Reliability Engineer to enhance a global execution platform, delivering robust solutions to trading desks and clients.

You will collaborate with expert teams, advancing your expertise in system administration, monitoring, and low-latency technologies. Join us to contribute to cutting-edge financial technology innovations.

Note that working on-site at the client's Lisbon office for 2-3 days per week is required.

Responsibilities
  • Design and enforce monitoring, alerting, and incident management strategies
  • Automate repetitive tasks and workflows to increase operational efficiency
  • Work alongside software engineering teams to build and launch scalable, dependable systems
  • Execute production deployments carefully to preserve platform stability
  • Handle incident management with thorough analysis and reporting to maintain service quality
  • Engage in on-call duties to support essential systems and services
  • Communicate clearly with colleagues to swiftly resolve technical problems
  • Maintain up-to-date documentation for operational workflows and system settings
  • Drive continuous improvements in system reliability and efficiency through proactive initiatives
Requirements
  • Deep understanding of Unix/Linux operating systems and networking with over 5 years experience
  • Proficiency in Unix/Linux shell scripting and programming languages including Python, Perl, C, C++, or Java
  • Experience with monitoring and observability solutions such as ITRS Geneos, Dynatrace, Prometheus, and Grafana
  • Strong troubleshooting skills for complex system issues
  • Experience in environments with high availability and heavy traffic
  • Bachelor’s or Master’s degree in IT engineering or a related discipline
  • Ability to collaborate effectively within a team and adapt to evolving environments
  • Self-driven with excellent problem-solving capabilities and thorough issue tracking
  • Excellent written and verbal communication abilities with English proficiency at B2+ level
Nice to have
  • Familiarity with log analysis tools like Splunk, ELK, Graylog, or Loki
  • Knowledge of network monitoring solutions such as Corvil
  • Experience with relational databases including Oracle, PostgreSQL, MySQL/MariaDB, or KDB/q
  • Understanding of messaging platforms like IBM MQ, Tibco, Solace, LBM, or Kafka
  • Experience with Infrastructure as Code tools such as Ansible or Terraform