The vacancy has been archived
The company is no longer hiring for this position. Check out similar vacancies

HPC DevOps Engineer

in Название скрыто (Fintech)

5 000 —‍ 7 000 $/‍month net

📍 Worldwide

Remote

Position

DevOps

Seniority level

Senior

English

B1 — Intermediate

Experience

4+ years

Technologies / Tools

Linux

AWS

Ansible

Terraform

Prometheus

Grafana

ELK

CI/CD

Docker

Kubernetes

Python/Bash

Jenkins

GitLab

Zabbix

Proxmox

We are hiring an HPC DevOps Engineer to design, develop, and support HPC clusters for research, financial backtesting and model optimizations.

The role focuses on SLURM-based workload management, cloud and hybrid setups, as well as high-performance computing infrastructure.

This role is ideal for someone who takes ownership, thrives in high-performance environments, and is eager to build scalable, efficient HPC systems.

Key responsibilities

Deploy, manage, and optimize HPC clusters using AWS ParallelCluster, SLURM, and parallel file systems.
Automate cluster provisioning, configuration, and scaling with Ansible, Terraform, and scripting (Bash/Python).
Implement monitoring, security, and CI/CD pipelines to ensure stability and efficiency.
Collaborate with cross-functional teams to design, implement, and optimize scalable and reliable infrastructure solutions.
Develop and maintain automation scripts and tools to streamline operational workflows.
Document new processes and procedures to ensure all documentation is up-to-date and relevant.
Troubleshoot and resolve complex issues related to infrastructure, deployment, and performance.

Requirements

4+ years of relevant work experience in an IT Ops role.
Expertise in Linux performance tuning, job schedulers (SLURM), and HPC storage solutions.
Understanding of networking concepts and technologies. (TCP/IP, firewalls, VPNs, load balancing).
Hands-on experience with AWS infrastructure, automation tools (Ansible, Terraform), and scripting (Python/Bash).
Familiarity with containerization (Docker, Kubernetes), monitoring (Prometheus, Grafana, ELK), and CI/CD pipelines (Gitlab, Jenkins).
Familiarity with the following technologies: Gitlab, iptables, IPsec, Docker, OpenVPN, Zabbix, Prometheus, Grafana, ELK, Proxmox, AWS.
Strong communication skills.
A deep sense of ownership and urgency; a detail-oriented approach to operations.

Would be a plus:

Experience with HPC-specific optimizations, parallel file systems, and cloud-native HPC solutions.
Knowledge of low-latency networking and high-speed interconnects.
Familiarity working with GPUs or other accelerators in HPC/ML/AI environments.

Interview process

HR interview.
Technical interview.
Test assignment.
Final interview.

About company Название скрыто (Fintech)

Industry

Финтех

Company size

201 - 500

The company name is under an NDA. A proven Fintech global company with an advanced stack and decade of successful trading experience is developing a proprietary platform to trade thousands of instruments across dozens of markets. The recruiter will disclose all details in person immediately upon response.