Why This Role Exists
Until now, our scientists have written the code they needed to run experiments. Your job is to accelerate them: build shared training infrastructure, enforce good development practices, and set up the first wave of automation (CI/CD, experiment tracking, distributed training).
This is a founding role at the intersection of Research Engineering and MLOps. You won’t just “maintain pipelines” – you’ll design the foundations that make our research sustainable and deployable. Think: turning promising NeurIPS-level prototypes into clean, documented, versioned codebases that others can build upon.
You would work on opensource packages and would be mentionned as coauthor in the papers
You’ll lay the groundwork for the engineering and infrastructure teams that will follow. This is an individual contributor role with growth potential.
1. Build shared ML infrastructure
Create reusable training pipelines and evaluation frameworks that both research teams can use with an emphasis on distributed training
Set up experiment tracking, model versioning, and reproducible environments
Build internal tooling to reduce friction in the research workflow
2. Bridge research and production
Partner with researchers to refactor promising prototypes into maintainable code
Establish coding standards and documentation practices (we want to improve research code quality, not police it)
Package models for deployment when they’re ready to move beyond research
3. Establish MLOps foundations
Set up CI/CD pipelines, testing frameworks, and deployment automation
Implement best practices for version control, code review, and reproducibility
Build the infrastructure for distributed training on our GPU clusters
4. Enable knowledge sharing
Create documentation and internal guides
Mentor researchers on software engineering practices (Git, testing, modular code)
Help establish a culture of building on each other’s work instead of starting from scratch
About the scope: We know this is ambitious for one person — and it’s intentional. This is a founding role where you’ll set the direction and priorities, not execute everything alone. We’re planning to grow the ML/MLOps team to 5-6 people over the next year, and you’ll play a key role in shaping what that team becomes.
We’re looking for someone with a builder mindset who sees this breadth as an opportunity: the chance to architect our ML infrastructure from the ground up, make high-impact decisions early, and grow into a leadership position as the team scales.
The Opportunity:
Greenfield infrastructure: Define how we build AI systems from the ground up – no legacy tech debt
High-leverage impact: Your infrastructure directly enables breakthrough research, not just incremental product features
Founding team member: Shape the engineering culture and practices that will scale with the company
Growth trajectory: As our first engineer, you’ll help build and potentially lead the platform team
Early-stage dynamics: Processes are being defined in real-time; you’ll need comfort with ambiguity and rapid iteration
Generalist demands: You’ll touch everything from training pipelines to deployment to documentation (specialization comes later)
Tech Stack (current):
PyTorch
Distributed Training (torchtitan)
Cloud GPU infrastructure
Early-stage tooling decisions are still open (you’ll help choose)
We are looking for exceptionnal people who have at least 6-7 years of experience and have seen a wide variety of roles/tasks within thje ML world (from Research Eng to MlOps)
Strong Python engineering: You write clean, tested, maintainable code by default (type hints, documentation, modular design) – and can teach others to do the same
PyTorch expertise: Deep familiarity with PyTorch for implementing and optimizing models; can debug researchers’ training code
MLOps fundamentals: Hands-on experience with Git workflows, CI/CD, Docker, experiment tracking tools (MLflow, Weights & Biases, etc.)
Distributed training: You’ve scaled training jobs across multiple GPUs or machines and understand the performance pitfalls (we are also happy to see you grow and learn it, if you haven’t done it already)
Bridge-builder mentality: You can work with brilliant researchers who sometimes write messy code, help them level up their software practices, and earn their trust.
Pragmatic autonomy: You’re comfortable scoping your own work, making pragmatic trade-offs between research velocity and engineering rigor, and asking for help when needed
Teaching ability: You can explain version control, testing, and modular design to scientists who’ve never used them – clearly and without condescension
Experience with generative AI or foundation models (LLMs, diffusion models, etc.)
Contributions to open-source ML projects (scikit-learn, Hugging Face, PyTorch ecosystem)
Cloud platform experience (AWS/GCP/Azure for ML workloads)
Know how to setup a slurm-based cluster
Systems programming skills (C++/CUDA for performance optimization)
Graduate degree or publications in ML/AI (but strong practical experience trumps credentials)
Experience working in a Resaerch lab, powering Research Scientists
You might be a great fit if:
You’ve lived in both worlds: Worked in academic labs and startups, understand both cultures, and know how to blend research rigor with engineering pragmatism
You find satisfaction in cleanup: Refactoring a 3000-line project into a clean Python package feels rewarding, not tedious
You’re a technical Swiss Army knife: Equally comfortable debugging a PyTorch distributed training deadlock and designing a CI/CD pipeline from scratch
You’re an enabler, not a gatekeeper: You want to be the person who makes research teams 10x faster.
Ambiguity doesn’t paralyze you: When a researcher says “training is slow,” you can independently investigate, form hypotheses, and propose solutions
You respect the research: You understand that “messy research code” often represents months of brilliant problem-solving, and your job is to preserve the insights while improving the structure
We will first review CV in batches every Wednesday..
When it comes to the actual detailed recrtuitment process we are still finalizing it and I’ll update this part asap but the idea is :
-Prescreen with Paul (Head of Talent)
-Technical interview (discussion) remote
-Onsite Interview (onsite)
Rencontrez Paul, Head of Talent Acquisition
Ces entreprises recrutent aussi au poste de “Engineering R&D”.