Orakl Oncology is pioneering a new paradigm in cancer drug development by building the world’s largest cohort of patient-derived organoid (PDO) avatars. Through our unique platform, we generate extensive multi-modal data from these avatars — combined with rich clinical data from hospital partners — to discover and validate new oncology therapeutics with real-world patient relevance.
We are seeking a Senior Data Scientist to own the end-to-end clinical data chain at Orakl: from the design of data collection protocols with hospital partners, to the delivery of clean, structured, AI-ready datasets to our data science teams. This is a foundational role that sits at the intersection of clinical domain knowledge, data engineering, and machine learning infrastructure. You will work hand-in-hand with clinicians, data scientists, and regulatory experts to build the clinical data backbone that powers our flagship predictive oncology platform.
Design Clinical Data Collection Protocols: Working directly with hospital teams and Clinical Research Associates (CRAs) , you’ll define the data points to collect based on clinical domain knowledge and predictive power, and translate them into electronic Case Report Form (eCRF) and data collection protocols ready for deployment in real clinical environments.
Own the Clinical Data Model: You’ll evaluate and decide on the right clinical data standards for Orakl’s context (FHIR, OMOP, or other), then define and maintain a unified data model that accommodates heterogeneous sources across hospital partners and scales as our network grows.
Build End-to-End Clinical Data Pipelines: You’ll develop and operate robust end to end pipelines, from raw eCRF outputs and hospital exports to structured, validated, AI-ready datasets. You will ensure every table delivered to data scientists is clean, consistent, and immediately usable.
Develop Hospital Feedback Loops: You’ll implement data quality control processes that automatically flag errors, inconsistencies, and anomalies in data received from hospital partners, and turn them into actionable feedback loops that protect both data quality and the partnership.
Feature Extraction: You’ll build PoCs for non-standard data source extractions: IHC, free-text clinical notes, and beyond, unlocking clinical signals for our AI models.
Master’s degree in a quantitative or life science discipline (Computer Science, Mathematics, Life Science Engineering, etc.)
3+ years of experience in a data engineering, data science, or software engineering.
Direct experience working with clinical or patient data, ideally in a regulated healthcare environment.
Proven track record of productionising code. This includes data pipelines, model outputs, or data transfer workflows in a real-world setting.
Proficiency in Python and SQL: solid understanding of data pipeline design and orchestration (e.g., Airflow, dbt, or equivalent).
Familiarity with cloud infrastructure (AWS, GCP, or Azure) and data storage best practices.
Knowledge of clinical data standards and interoperability frameworks (FHIR, OMOP, HL7).
Experience with eCRF systems or clinical data management platforms.
Exposure to NLP or information extraction techniques applied to clinical text or imaging metadata.
Familiarity with French or European health data regulations (HDS certification, CNIL, GDPR).
Prior experience in a healthtech, biotech, or hospital environment
HR Call — Getting to know each other, aligning on expectations and context.
Technical Deep Dive — A deep conversation on your past experience with clinical data pipelines, data modeling, and production engineering.
Technical Case — A system design exercise representative of the real clinical data challenges you’ll face at Orakl.
Reference Call — A conversation with one or two people you’ve worked with closely.
Founder Interview — A final discussion with our founders on vision, culture fit, and mutual ambitions.
Rencontrez Gustave, CTO
Rencontrez Fanny, Co-founder
Estas empresas también contratan para el puesto de "{profesión}".