Skip To Main Content
backBack to Search

Senior DevOps Engineer (HPC)

Remote in Poland
DevOps& 7 others
hot
Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

We are seeking a Senior DevOps Engineer to enhance our high-performance computing services and collaborate closely with the scientific community to optimize research computing.

Join our team to build and operate cutting-edge HPC capabilities using automation and infrastructure-as-code. Apply now to contribute to innovative computational solutions in a dynamic environment.

Responsibilities
  • Design, implement, and maintain robust platform infrastructure using Infrastructure as Code tools such as Terraform
  • Develop, deliver, and operate research computing services and applications
  • Apply Site Reliability Engineering principles to manage HPC service deployment, monitoring, and incident response
  • Solve complex technical problems related to HPC services and user applications
  • Manage large-scale HPC, HTC, or BC computing environments for optimal performance
  • Collaborate with scientific users to tailor HPC resources to research needs
  • Automate deployment processes to ensure consistency across HPC infrastructure
  • Maintain and administer large-scale cluster and server computing software such as Slurm, LSF, or Grid Engine
  • Develop and maintain monitoring dashboards using tools like Grafana and Prometheus
  • Work within a DevOps team environment following agile methodologies
  • Operate and utilize virtualized private cloud resources such as OpenStack
  • Administer large-scale parallel filesystems including Weka, GPFS, or Lustre
  • Use configuration management tools like Ansible, Salt, or Puppet to manage IT operations
  • Develop scripts and tools for HPC and DevOps platform operations using Bash and Python
Requirements
  • 3+ years of experience with DevOps processes and automation using Infrastructure as Code tools such as Terraform
  • Hands-on experience operating or engineering large-scale HPC or similar computing environments
  • Proven expertise in Linux system administration including TCP/IP networking and storage subsystems
  • Experience administering large-scale cluster management software such as Slurm, LSF, or Grid Engine
  • Knowledge of configuration management tools like Ansible, Salt, or Puppet
  • Experience working in agile DevOps teams
  • Ability to develop and maintain monitoring tools such as Grafana and Prometheus
  • Experience with scripting languages such as Bash and Python for automation and tool development
  • Strong experience managing virtualized private cloud environments like OpenStack
  • Scientific degree or equivalent experience in computationally intensive scientific data analysis
  • Proven ability to manage relationships with third-party suppliers
  • Upper-intermediate proficiency in English (B2+)
Nice to have
  • Experience with container technologies such as LXD, Singularity, Docker, or Kubernetes
  • Operation and configuration experience with public cloud platforms like AWS, Azure, or GCP
  • Experience with HashiCorp tools such as Vault, Consul, and Nomad
  • Development experience with programming languages such as Java, C++, Python, Ruby, or Perl
  • Experience with parallel filesystems like Weka, GPFS, or Lustre