Cette offre n’est plus disponible.

Research Scientist Intern

Stage(5 à 6 mois)
Paris
Salaire : 1,2K à 1,8K € par mois
Début : 30 septembre 2023
Télétravail occasionnel
Expérience : < 6 mois
Éducation : Bac +5 / Master

DiliTrust
DiliTrust

Cette offre vous tente ?

Questions et réponses sur l'offre

Le poste

Descriptif du poste

Your role

  • Review the state-of-the-art on your topic and propose a way to improve it or adapt it to our needs

  • Implement these new models and evaluate your ideas on our dataset

  • Push your work from idea to production and monitor how it impacts customers

  • Share your work with the community by publishing a research paper and presenting at technical gatherings

We have several research topics covering a large scope of Machine Learning fields

You can also check our tech blog ->https://research.dilitrust.com/

Legal Large Language Model

We are offering a technical internship opportunity focused on training a large language model (LLM) on legal data and constructing a legal instruction-based dataset. This internship is designed for individuals who are enthusiastic about natural language processing (NLP), machine learning, and the legal domain. During this internship you will be an integral part of our research team, contributing to the development of cutting-edge AI models that enhance the legal industry.

Responsibilities:

  1. Dataset Construction:

    • Collaborate with our team to identify and gather available open-source legal datasets.

    • Curate a comprehensive and diverse dataset that covers various legal domains, including contracts, case law, statutes, and regulations.

    • Implement data preprocessing techniques to ensure data consistency, uniformity, and quality.

    • Construct a legal instruction-based dataset by extracting key legal queries, questions, or prompts from the collected data. This dataset will be used for fine-tuning the language model.

  2. Model Selection:

    • Investigate state-of-the-art large language models (LLMs) with a focus on those that have demonstrated competence in handling legal text and tasks.

    • Compare different pre-trained models on benchmark legal NLP tasks and determine their suitability for the legal domain.

  3. Model Fine-tuning:

    • Fine-tune the selected LLM on the legal instruction-based dataset to adapt it to legal text understanding and generation.

    • Implement strategies to avoid potential biases and ethical concerns during fine-tuning.

  4. Evaluation and Optimization:

    • Develop evaluation metrics and methodologies to measure the performance of the fine-tuned legal domain model.

    • Conduct rigorous testing and validation to ensure the model’s effectiveness in legal text comprehension, summarization, contract analysis, and related tasks.

    • Iterate on the fine-tuning process by fine-tuning hyper-parameters and architecture choices to improve model performance.

Why you should apply?

  • It’s the perfect timing to join Dilitrust in terms of growth and scientific challenges

  • We have a strong Machine Learning team of 8 Data Scientists and Data Engineers that will help you learn on multiple topics both theoretically and practically

  • We have a team of passionate people and a healthy work environment where we value initiative and technical expertise

  • You will have the opportunity to leverage state of the art AI literature to contribute to an innovative product and develop new use cases with a real business impact

  • We are an amazing team of 200+ people trying to disrupt the legaltech scene


Profil recherché

Your profile:

  • Student from a major engineering school or equivalent master’s degree

  • You have advanced technical skills in Applied Mathematics (Machine Learning / Optimization)

  • Your have a solid knowledge of Python and can write quality code

  • Your have a good knowledge of Deep Learning frameworks (Preferably pytorch and transformers)

  • You have a previous experience in Machine Learning (personal/school project or a previous internship)

Preferred experience:

  • Previous knowledge of Natural Language Processing is a big plus

  • You like reading research papers and implement state-of-the-art models and like sharing your research findings with fellow team members

  • You like writing scientific papers/blog posts and contribute to the community


Déroulement des entretiens

  1. Apply here by sending us your CV, a link to your github is also appreciated
  2. Phone interview
  3. Onsite / Remote technical test
  4. Meet with the team

Et voilà.

Envie d’en savoir plus ?