Senior HPC Cluster Engineer

в Nebius

4 000 —‍ 8 000 €/мес на руки

📍 СербияПомощь с переездом
С / Golang
6 - 10 человек

Nebius AI is an AI-centric public cloud platform specifically crafted to serve AI models for training and inference.

Our mission is to help ML practitioners concentrate on their core jobs, while DevOps, MLOps, and infrastructure-related tasks are handled by us. The idea is to build an ML-specific cloud platform covering the entire ML lifecycle from A to Z: from data preparation and labeling to ML training and inference.

We recognize the potential of ML and AI technologies and aim to provide our future users with the perfect environment to train and fine-tune their models. We are committed to delivering the best user experience and excellent customer support.

We’re looking for a Senior HPC Cluster Engineer to contribute to the development of our hyperscaler platform.



About the company

  • Nebius is headquartered in the Netherlands, with hubs in Finland, Serbia, and Israel.
  • Our own data center in Finland features server racks designed in-house for ML-specific high load, with power-efficient solutions, including a free-cooling system.
  • Our mature team of engineers has a proven track record in developing sophisticated cloud and ML solutions and designing cutting-edge hardware.

About the team

The Hypervisor team supports and develops the parts of the Cloud platform that directly affect the KVM hypervisor and QEMU device emulator. We understand the granular details of hardware virtualization and device emulation, paying close attention to performance and protection against untrusted code.

In this position, your responsibility will be to

  • Improve infrastructure around GPU-accelerated computing.
  • Analyze root cause and suggest corrective action for problems large and small scales.
  • Add new hardware support through all infrastructure software stack.
  • Detect and fix problems before they occur.

We expect you to have

  • 5+ years of professional software development experience.
  • 3+ years of experience with Linux.
  • Fluency in Go programming language.
  • General understanding of QEMU/KVM virtualization stack.

It would be an added bonus if you had::

  • System level understanding of server architecture, PCIe devices, NICs, Linux OS and Kernel drivers.
  • Experience analyzing and tuning performance for a variety of HPC workloads.
  • Familiarity with RDMA, RoCE, InfiniBand.
  • Background with Software Defined Networking and HPC cluster networking.
  • Familiarity with deep learning frameworks like PyTorch and TensorFlow.

Does this sound like the challenge you've been looking for? If so, we invite you to join us!

Екатерина Козяйкина IT Recruiter

О компании Nebius

Продуктовая компания
11 - 50

Nebius — современная IT-компания, помогающая создавать собственные локальные облачные платформы крупным B2B-бизнесам. Nebius предоставляет не только технологии, но и готовую к запуску бизнес-модель, включая инструменты для поддержки, продаж и маркетинга.

Похожие вакансии

6 500 – 7 200 €/мес на руки
📍 Лимассол (Кипр), помощь с переездом
5 000 – 6 300 $/мес на руки
📍 Лондон (Великобритания), полная удалёнка, самостоятельный переезд
5 000 – 6 500 $/мес на руки
📍 Таллин (Эстония), полная удалёнка, самостоятельный переезд
4 500 – 5 500 €/мес на руки
📍 Лимассол (Кипр), помощь с переездом
4 200 – 6 600 €/мес на руки
📍 Берлин (Германия), помощь с переездом