L'envoi d'un CV est-il obligatoire pour postuler à cette offre ?

Pour postuler à cette offre, l'envoi de votre CV est obligatoire.

Le télétravail est-il possible pour ce poste ?

Ce poste n'est pas possible en télétravail.

Quel est le type de contrat pour ce poste ?

Le contrat pour ce poste est de type {contract_type}.

Une lettre de motivation est-elle obligatoire pour postuler à cette offre ?

La lettre de motivation est obligatoire pour postuler à cette offre.

Stage Recherche: Systeme neuronaux pour l'indexation du Web/Moteurs de Recheche - Qwant

Cette offre n’est plus disponible.

Qwant

Stage Recherche: Systeme neuronaux pour l'indexation du Web/Moteurs de Recheche

Stage(3 à 6 mois)

Paris, Nice…

Salaire : Non spécifié

Télétravail non autorisé

Éducation : > Bac +5 / Doctorat

il y a 2 ans

Qwant

Cette offre vous tente ?

Questions et réponses sur l'offre

Le poste

Descriptif du poste

Context

Information Retrieval (IR) has seen a change of paradigm over the last few years with the advance of Neural IR, which mostly rely on transformers-based dense representation. Many approaches have been developed over a short period of time with impressive improvements [Lin et al. 2020, Tonellotto 2022] over traditional bag-of-word based ranking methods such as BM25.

These improvements come at a cost, implying that neural IR models are not always efficient enough for being used in a live web-scale search engine where latency is critical: Increasing computation time by several dozens of milliseconds can significantly impact revenue. Recent methods such as ColBERTv2 [Santhanam et al. 2021], based on fast indices based on nearest neighbor search [Boytsov et Byberg 2020], reduced the need of repeatedly encoding documents for each queries and directly proposed ways to index dense vector representations and computing late interactions at query time, hence significantly reducing latency over the usual cross-encoder setting. Alternatives, based on sparse neural IR models [Formal et al. 2021] also allow for fast retrieval, but their latency is much less studied than for their bag-of-words counterparts.

This internship would be conducted within Qwant, a privacy-preserving French search engine that serves over 200 million queries per month in part with its own index and retrieval stack.
Objectives

The goal of this internship would be first to study, implement and evaluate dense Neural IR architectures such as ColBERTv2 [Santhanam et al. 2021] or derived models [Hofstätter et al. 2022] within Vespa, the indexing and retrieval platform used at Qwant. The intern would also be encouraged to explore other types of ranking models, including sparse ones such as SPLADEv2 [Formal et al. 2021].

Provided the preliminary study and models do perform well, integrating these approaches to the full Qwant index would be the next step. The intern would have the unique opportunity to test their implementations on real users by running A/B tests.

More generally, the intern is encouraged to freely experiment their ideas, and in participating in evaluation campaigns such as the TREC Deep Learning track and/or in writing of a research publication in the IR (e.g. ECIR, SIGIR) or machine learning (ICLR) venues.
Bibliography

[Boytsov et Byberg 2020] L. Boytsov et E. Nyberg, « Flexible retrieval with NMSLIB and FlexNeuART », arXiv:2010.14848 [cs], oct. 2020. http://arxiv.org/abs/2010.14848

[Formal et al. 2021] Formal, Thibault, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. 2021. “SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval.” arXiv [cs.IR]. arXiv. http://arxiv.org/abs/2109.10086.

[Hofstätter et al. 2022] Hofstätter, Sebastian, Omar Khattab, Sophia Althammer, Mete Sertkan, and Allan Hanbury. 2022. “Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions Using Enhanced Reduction.” arXiv [cs.IR]. arXiv. http://arxiv.org/abs/2203.13088.

[Lin et al., 2020] Lin, Jimmy, Rodrigo Nogueira, and Andrew Yates. 2020. “Pretrained Transformers for Text Ranking: BERT and Beyond.” arXiv [cs.IR]. arXiv. http://arxiv.org/abs/2010.06467.

[Santhanam et al. 2021] Santhanam, Keshav, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2021. “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction.” arXiv [cs.IR]. arXiv. http://arxiv.org/abs/2112.01488.

[Tonellotto 2022] Tonellotto, Nicola. 2022. “Lecture Notes on Neural Information Retrieval.”

arXiv [cs.IR] http://arxiv.org/abs/2207.13443
Organization

The internship will take place at the Qwant offices with visits to ISIR (remote work is also possible). The internship is supervised by Benjamin Piwowarski from ISIR, and Lara Perinetti and Romain Deveaud from Qwant.

The intern will potentially work with the following tools/technologies:

Deep Learning libraries (PyTorch, TensorFlow, Jax/Flax, Huggingface ecosystem, etc.)PythonVespa indexing and retrieval platform (https://vespa.ai/)Search engine tools (https://github.com/vespa-engine/pyvespa)Git version controlJupyter Environment

Qwant will provide the intern a laptop and access to a remote compute server with GPU capabilities.

Team description

You will work in the Core Search team, in charge of the maintenance and development of Qwant’s own Web search engine.

The team is mainly composed of Data Scientists, Data Engineers and backend developers, working on Big Data and Machine Learning, Information Retrieval and NLP (Natural Language Processing) issues.

This year, Qwant offers two research-oriented internships in collaboration with Benjamin Piwowarski whose work focuses on information retrieval.

This one focuses on the improvement of the ranking algorithm.

Déroulement des entretiens

Entretiens avec l’équipe.

Envie d’en savoir plus ?

Rencontrez Jonathan, Full Stack Developer

Découvrez l'entreprise

Explorez la vitrine de l’entreprise ou suivez-la pour savoir si elle vous correspond vraiment !

Explorer l’entreprise

Ils sont sociables

L'entreprise

Qwant

Big Data

84 collaborateurs

Créée en 2013

Âge moyen : 37 ans

25%

75%

Qui sont-ils ?

Qwant est le premier moteur de recherche à développer une véritable alternative sur le marché européen de la recherche en ligne, notamment de par son positionnement de respect de la privacy.

Qwant ouvre une nouvelle page de son histoire, avec l’évolution de son actionnariat et la création de Synfonium dont l’ambition est le développement de services en ligne souverains innovants et référents. Une transformation profonde de l’entreprise doit démarrer.

Aujourd’hui, nous recherchons de nouveaux talents pour nous aider à surmonter nos prochains défis en Europe.

Si vous voulez que votre travail ait réellement un sens et aide des millions d’utilisateurs, si vous pensez que la vie privée est importante même lorsque vous n’avez « rien à cacher », si vous aimez être mis au défi et que vous n’avez pas peur de l’échec, alors nous pourrions avoir quelques points communs.