We are looking for a Performance ML Engineer to optimise the efficiency and cost of our machine learning models, preparing them for large-scale deployment.
Currently, we operate with 8x H100 GPUs in a single-node configuration, which serves our immediate research and development needs, but we anticipate scaling up to clusters of 64+ H100 in the coming years, with compute needs reaching millions of GPU hours annually.
Your mission: Maximise model performance (latency, throughput, cost per inference) through advanced optimisation techniques (quantisation, distillation, CUDA/Triton kernels, GPU profiling), while preparing our architecture for future scalability.
Designing systems for distributed training and inference:
Build scalable distributed training pipelines
Debug foundational model trainings
Oversee and scale up GPU clusters
Optimisation:
Profile GPU usage and bottlenecks
Optimise models, implementing techniques such as quantisation, distillation or kernel improvements
Collaboration:
Work with the R&D team to integrate optimisations into production pipelines.
Document benchmarks and performance gains (latency, cost, accuracy).
Stay up to date on new architectures (e.g., H200, TPU).
Model Optimisation Experience: Quantisation, distillation, or kernel optimisation (CUDA/Triton).
GPU Profiling: Experience with Nsight Systems/Compute or similar tools (e.g., PyTorch Profiler).
Distributed training: DDP, FSDP.
Mixed Precision: FP16/BF16, loss scaling, and troubleshooting.
Problem-Solving: Ability to diagnose performance issues and propose innovative solutions.
Fluent English (the team speaks English in the day-to-day)
Experience with Triton or custom CUDA kernels.
Knowledge of ML compilers (e.g., Apache TVM, TensorRT).
Experience with large-scale GPU clusters (even modest ones).
Open-source contributions or publications on model optimisation.
Experience with one or more of these domains: neurology, multi-modality, (conditional) generation or interpretability.
While technical excellence is critical, we place equal importance on how we work together. We believe the best teams are built on:
Integrity & Respect
Open Communication & Humility
Psychological Safety & Camaraderie
Prescreen with Paul (Head of People)
Technical Screen with one Research Engineer
On-site (Take-home exercise and restitution OR onsite interviews + Behavioural interview)
Rencontrez Paul, Head of Talent Acquisition
Ces entreprises recrutent aussi au poste de “Data / Business Intelligence”.