Senior Site Reliability Engineer - DevOps
Hybrid in Portugal: Lisbon
Site Reliability Engineering
& 6 others
Choose an option
We are seeking a Senior Site Reliability Engineer to support a global execution platform and deliver high-quality solutions to trading desks and clients.
You will work closely with top specialists, developing your skills in system management, monitoring, and low-latency technology. Apply now to be part of a team driving innovation in financial technology.
Please note that working from the customer's office in Lisbon is required 2-3 days per week.
Responsibilities
- Develop and implement monitoring, alerting, and incident response strategies
- Automate routine tasks and processes to improve efficiency
- Collaborate with software engineering teams to design and deploy reliable, scalable systems
- Deploy production changes with precision to maintain platform integrity
- Manage incidents including detailed analysis and reporting to ensure high service levels
- Participate in on-call rotations to support critical systems and services
- Communicate effectively with team members to resolve issues promptly
- Maintain documentation for operational procedures and system configurations
- Continuously improve system reliability and performance through proactive measures
Requirements
- Strong knowledge of Unix/Linux systems and networking with 3+ years experience
- Proficiency in Unix/Linux shell scripting and programming languages such as Python, Perl, C, C++, or Java
- Experience with monitoring and observability tools like ITRS Geneos, Dynatrace, Prometheus, and Grafana
- Ability to troubleshoot complex systems and resolve issues efficiently
- Experience working in high-availability, high-traffic environments
- Bachelor’s or Master’s degree in IT engineering or related field
- Ability to work effectively in a team and adapt to new environments
- Self-motivated with strong problem-solving and issue follow-up skills
- Excellent written and verbal communication skills with English level B2+
Nice to have
- Experience with log management tools such as Splunk, ELK, Graylog, or Loki
- Knowledge of network monitoring tools like Corvil
- Familiarity with databases including Oracle, PostgreSQL, MySQL/MariaDB, or KDB/q
- Experience with messaging systems such as IBM MQ, Tibco, Solace, LBM, or Kafka
- Familiarity with Infrastructure as Code tools like Ansible or Terraform