Review the state-of-the-art on your topic and propose a way to improve it or adapt it to our needs
Implement these new models and evaluate your ideas on our dataset
Push your work from idea to production and monitor how it impacts customers
Share your work with the community by publishing a research paper and presenting at technical gatherings
We have several research topics covering a large scope of Machine Learning fields
You can also check our tech blog ->https://research.dilitrust.com/
We are offering a technical internship opportunity focused on training a large language model (LLM) on legal data and constructing a legal instruction-based dataset. This internship is designed for individuals who are enthusiastic about natural language processing (NLP), machine learning, and the legal domain. During this internship you will be an integral part of our research team, contributing to the development of cutting-edge AI models that enhance the legal industry.
Responsibilities:
Dataset Construction:
Collaborate with our team to identify and gather available open-source legal datasets.
Curate a comprehensive and diverse dataset that covers various legal domains, including contracts, case law, statutes, and regulations.
Implement data preprocessing techniques to ensure data consistency, uniformity, and quality.
Construct a legal instruction-based dataset by extracting key legal queries, questions, or prompts from the collected data. This dataset will be used for fine-tuning the language model.
Model Selection:
Investigate state-of-the-art large language models (LLMs) with a focus on those that have demonstrated competence in handling legal text and tasks.
Compare different pre-trained models on benchmark legal NLP tasks and determine their suitability for the legal domain.
Model Fine-tuning:
Fine-tune the selected LLM on the legal instruction-based dataset to adapt it to legal text understanding and generation.
Implement strategies to avoid potential biases and ethical concerns during fine-tuning.
Evaluation and Optimization:
Develop evaluation metrics and methodologies to measure the performance of the fine-tuned legal domain model.
Conduct rigorous testing and validation to ensure the model’s effectiveness in legal text comprehension, summarization, contract analysis, and related tasks.
Iterate on the fine-tuning process by fine-tuning hyper-parameters and architecture choices to improve model performance.
It’s the perfect timing to join Dilitrust in terms of growth and scientific challenges
We have a strong Machine Learning team of 8 Data Scientists and Data Engineers that will help you learn on multiple topics both theoretically and practically
We have a team of passionate people and a healthy work environment where we value initiative and technical expertise
You will have the opportunity to leverage state of the art AI literature to contribute to an innovative product and develop new use cases with a real business impact
We are an amazing team of 200+ people trying to disrupt the legaltech scene
Your profile:
Student from a major engineering school or equivalent master’s degree
You have advanced technical skills in Applied Mathematics (Machine Learning / Optimization)
Your have a solid knowledge of Python and can write quality code
Your have a good knowledge of Deep Learning frameworks (Preferably pytorch and transformers)
You have a previous experience in Machine Learning (personal/school project or a previous internship)
Preferred experience:
Previous knowledge of Natural Language Processing is a big plus
You like reading research papers and implement state-of-the-art models and like sharing your research findings with fellow team members
You like writing scientific papers/blog posts and contribute to the community
Et voilà.