MLOps Infrastructure Engineer

Join our team as an MLOps Infrastructure Engineer, where you'll design and deploy a high-performance platform for distributed machine learning. You'll work with cloud and Kubernetes architecture, develop internal tools for MLOps, and implement DevOps best practices. This role requires 3-4 years of experience in cloud infrastructure, DevOps, or MLOps, as well as proficiency in Kubernetes, cloud GPU management, Python, and CI/CD. Bonus skills include low-level optimization, backend/API experience, and designing partner-facing tools.

jobs.show.blocks.metaData.summary.generated

Plný úvazek
Paris
Příležitostná práce z domova
Plat: Neuvedeno
zkušenosti: > 5 let
Vzdělání: Magisterský stupeň vzdělání
jobs.show.blocks.metaData.subtitle.key_missions

Concevoir et déployer une plateforme pour rendre les GPU, les clusters et l'entraînement distribué transparents.

Développer et améliorer l'orchestrateur interne pour simplifier l'entraînement distribué.

Mettre en œuvre l'Infrastructure-as-Code (Terraform/Pulumi) pour la reproductibilité et l'évolutivité.

Sigma Nova
Sigma Nova

Máte zájem o tuto nabídku?

Otázky a odpovědi ohledně nabídky

Pozice

Popis pozice

The Challenge: Build the Platform That Powers Research and Beyond

Your mission: Design and deploy the platform that makes GPUs, clusters, and distributed training transparent, not just for internal research, but also as a foundation for monetizable capabilities (e.g., managed training services, optimised inference pipelines for partners).

What You’ll Do

  • Cloud & Kubernetes Architecture:

    • Build and maintain a high-performance, multi-tenant environment on Scaleway and GENCI, optimised for distributed ML.

    • Deploy and supervise a Slurm cluster for research workload, ensuring seamless integration with Scaleway’s infrastructure.

    • Automate scaling, resource allocation, and cost management to avoid technical debt.

  • MLOps & Internal Tools:

    • Develop and enhance our internal orchestrator to simplify distributed training (FSDP, data pipelines) for both researchers and external users.

    • Create reusable frameworks for monitoring, logging, efficiency, and cost tracking.

    • Collaborate with research teams to industrialise workflows (e.g., model alignment, large-scale finetuning) and package them as deployable capabilities.

  • DevOps & Software Craftsmanship:

    • Implement Infrastructure-as-Code (Terraform/Pulumi) for reproducibility and scalability.

    • Write clean, typed, and documented Python code

    • Troubleshoot at the intersection of hardware (GPUs, networking) and software (PyTorch, CUDA), ensuring robustness for both internal and external use cases.


Požadavky na pozici

Key Skills

  • Experience: 3–4 years in cloud infrastructure, DevOps, or MLOps (research or industry).

  • Technologies:

    • Kubernetes/Docker: Advanced orchestration and containerization.

    • Cloud GPU Management: Scaleway, AWS/GCP (clusters, networking, storage).

    • Python: Proficiency in PEP standards, typing, and testing.

    • MLOps: Data pipelines, distributed training (PyTorch, FSDP), monitoring.

    • CI/CD: Pipeline setup and maintenance.

    • Fluent English (the team speaks English in the day-to-day)

Bonus Skills

  • Low-level optimisation (Triton, CUDA), HPC, or large-scale training experience.

  • Backend/APIs (FastAPI, gRPC) for exposing models or services.

  • Experience designing partner-facing tools or managed services.

Beyond Technical Skills:

While technical excellence is critical, we place equal importance on how we work together. We believe the best teams are built on:

  • Integrity & Respect

    • We are striving for honesty, kindness, and fairness. We value people who treat others with dignity and foster an environment where everyone feels heard.
  • Open Communication & Humility

    • Great ideas come from collaboration. We look for teammates who listen actively, communicate clearly, and approach challenges with self-awareness and humility.
  • Psychological Safety & Camaraderie

    • We strive to create a space where people feel safe to take risks, ask questions, and grow.

Proces náboru

  • Prescreen with Paul (Head of People)

  • Technical Screen with one Research Scientist or Research Engineer

  • On-site (Take-home exercise and restitution OR On site live interviews + Behavioural interview)

Chcete se dozvědět více?

Tato volná pracovní místa by vás mohla zajímat!

Tyto společnosti rovněž nabírají pracovníky na pozici "{profese}".

  • Lenstra

    Senior Analytics Engineer

    Lenstra
    Lenstra
    Plný úvazek
    Paris
    Příležitostná práce z domova
    Software, Artificial Intelligence / Machine Learning
    30 zaměstnanci

  • Sigma Nova

    ML Performance Engineer

    Sigma Nova
    Sigma Nova
    Plný úvazek
    Paris
    Příležitostná práce z domova
    Artificial Intelligence / Machine Learning
    16 zaměstnanci

  • Monk AI

    Senior Machine Learning Engineer

    Monk AI
    Monk AI
    Plný úvazek
    Paris
    Několik dní doma
    Software, Artificial Intelligence / Machine Learning

  • Artefact

    Open Application

    Artefact
    Artefact
    Plný úvazek
    Paris
    Několik dní doma
    Artificial Intelligence / Machine Learning, Digital Marketing / Data Marketing
    1 500 zaměstnanci

  • Mistral Ai

    Web Crawling Engineer

    Mistral Ai
    Mistral Ai
    Plný úvazek
    Paris
    Několik dní doma
    Artificial Intelligence / Machine Learning, IT / Digital
    280 zaměstnanci

  • Implicity

    Data Analytics Engineer

    Implicity
    Implicity
    Plný úvazek
    Paris
    Několik dní doma
    Plat: 52K až 57K €
    Software, Artificial Intelligence / Machine Learning
    100 zaměstnanci

Podívat se na všechny nabídky