Stage Recherche: Systeme neuronaux pour l'indexation du Web/Moteurs de Recheche


Stage Recherche: Systeme neuronaux pour l'indexation du Web/Moteurs de Recheche

  • Stage (3 à 6 mois)
  • Paris, Nice, Le Petit-Quevilly
  • Éducation : > Bac +5 / Doctorat
  • Expérience : Non spécifié




  • Big Data
  • Entre 50 et 250 salariés

Le poste

Stage Recherche: Systeme neuronaux pour l'indexation du Web/Moteurs de Recheche

  • Stage (3 à 6 mois)
  • Paris, Nice, Le Petit-Quev…
  • Éducation : > Bac +5 / Doctorat
  • Expérience : Non spécifié

Cette offre a été pourvue !

Qui sont-ils ?

Lancé en 2013, conçu et développé avec passion en France, Qwant est le moteur de recherche européen qui respecte la vie privée de ses utilisateurs. Afin de garantir la meilleure expérience utilisateur, Qwant s’appuie sur son propre index du Web, des équipes pleines d’audace et sur des technologies innovantes de Machine Learning, et de Natural Language Processing.

Qwant se base sur trois piliers fondamentaux : offrir un service de recherche internet de qualité, offrir une vision responsable du web et au cœur de tout, respecter la vie privée de ses utilisateurs.
Ainsi, l’entreprise ne collecte pas les données personnelles et ne propose aucune publicité ciblée. Les algorithmes de classement des informations garantissent pour chaque requête utilisateur des résultats pertinents qui ne sont pas influencés par la collecte de données personnelles.

Aujourd’hui le moteur de recherche Qwant compte chaque mois près de 6 millions d’utilisateurs dans le monde et répond à plus de 2 milliards de requêtes.
Nous proposons différents services respectant la vie privée de nos utilisateurs : un moteur de recherche Qwant Search, disponible aussi pour les 6-12 ans avec Qwant Junior, une cartographie Qwant Maps, un bloqueur de traqueurs-cookies avec l’extension Qwant VIPrivacy.

Envie d’en savoir plus sur Qwant ?Culture d'entreprise, équipes, stack technique, offres d'emplois... C’est parti pour l’immersion !
Visiter le profil

Descriptif du poste


Information Retrieval (IR) has seen a change of paradigm over the last few years with the advance of Neural IR, which mostly rely on transformers-based dense representation. Many approaches have been developed over a short period of time with impressive improvements [Lin et al. 2020, Tonellotto 2022] over traditional bag-of-word based ranking methods such as BM25.

These improvements come at a cost, implying that neural IR models are not always efficient enough for being used in a live web-scale search engine where latency is critical: Increasing computation time by several dozens of milliseconds can significantly impact revenue. Recent methods such as ColBERTv2 [Santhanam et al. 2021], based on fast indices based on nearest neighbor search [Boytsov et Byberg 2020], reduced the need of repeatedly encoding documents for each queries and directly proposed ways to index dense vector representations and computing late interactions at query time, hence significantly reducing latency over the usual cross-encoder setting. Alternatives, based on sparse neural IR models [Formal et al. 2021] also allow for fast retrieval, but their latency is much less studied than for their bag-of-words counterparts.

This internship would be conducted within Qwant, a privacy-preserving French search engine that serves over 200 million queries per month in part with its own index and retrieval stack.

The goal of this internship would be first to study, implement and evaluate dense Neural IR architectures such as ColBERTv2 [Santhanam et al. 2021] or derived models [Hofstätter et al. 2022] within Vespa, the indexing and retrieval platform used at Qwant. The intern would also be encouraged to explore other types of ranking models, including sparse ones such as SPLADEv2 [Formal et al. 2021].

Provided the preliminary study and models do perform well, integrating these approaches to the full Qwant index would be the next step. The intern would have the unique opportunity to test their implementations on real users by running A/B tests.

More generally, the intern is encouraged to freely experiment their ideas, and in participating in evaluation campaigns such as the TREC Deep Learning track and/or in writing of a research publication in the IR (e.g. ECIR, SIGIR) or machine learning (ICLR) venues.

[Boytsov et Byberg 2020] L. Boytsov et E. Nyberg, « Flexible retrieval with NMSLIB and FlexNeuART », arXiv:2010.14848 [cs], oct. 2020.

[Formal et al. 2021] Formal, Thibault, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. 2021. “SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval.” arXiv [cs.IR]. arXiv.

[Hofstätter et al. 2022] Hofstätter, Sebastian, Omar Khattab, Sophia Althammer, Mete Sertkan, and Allan Hanbury. 2022. “Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions Using Enhanced Reduction.” arXiv [cs.IR]. arXiv.

[Lin et al., 2020] Lin, Jimmy, Rodrigo Nogueira, and Andrew Yates. 2020. “Pretrained Transformers for Text Ranking: BERT and Beyond.” arXiv [cs.IR]. arXiv.

[Santhanam et al. 2021] Santhanam, Keshav, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2021. “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction.” arXiv [cs.IR]. arXiv.

[Tonellotto 2022] Tonellotto, Nicola. 2022. “Lecture Notes on Neural Information Retrieval.”

arXiv [cs.IR]

The internship will take place at the Qwant offices with visits to ISIR (remote work is also possible). The internship is supervised by Benjamin Piwowarski from ISIR, and Lara Perinetti and Romain Deveaud from Qwant.

The intern will potentially work with the following tools/technologies:

Deep Learning libraries (PyTorch, TensorFlow, Jax/Flax, Huggingface ecosystem, etc.)
Vespa indexing and retrieval platform (
Search engine tools (
Git version control
Jupyter Environment

Qwant will provide the intern a laptop and access to a remote compute server with GPU capabilities.

Team description

You will work in the Core Search team, in charge of the maintenance and development of Qwant’s own Web search engine.

The team is mainly composed of Data Scientists, Data Engineers and backend developers, working on Big Data and Machine Learning, Information Retrieval and NLP (Natural Language Processing) issues.

This year, Qwant offers two research-oriented internships in collaboration with Benjamin Piwowarski whose work focuses on information retrieval.

This one focuses on the improvement of the ranking algorithm.

Déroulement des entretiens

Entretiens avec l’équipe.


Cette offre vous tente ?

Questions et réponses sur l'offre