To have the most relevant environmental scores, we need a rich database, relying of multiple sources (relying on a single source is risky), a efficient impact computation (mostly the 2° aligment and the biodiversity impact) and frequent deep analyses on the quality of our scoring.
Among the ~30 companies employees, the data science team (4 people) is part of the IT team (10 people) and is looking for an apprentice to:
Improve our data pipelines (we fetch financial and ecological data from different sources). The cleanest the data we fetch, the better is our final coverage of companies worldwide.
Search and collect new data sources (GHG emissions from companies, assets, tons of products produced, consumed, companies facilities, etc.
Leverage on Large Language Models (ChatGPT like) API and Langchain framework to improve our semi-structured data parser (like ESG reports)
Automatically detect discrepancies between the default model and all refinements brought by our analysts.
Benchmark our scores against academic environmental papers and competitors
Minimum: Data Science, data analysis, data engineering, Python, Pandas, SQL, Git, interest in environmental issues.
Optional: Machine Learning, Docker, Langchain, FastAPI, ElasticSearch, Streamlit, Spark, Kubertnetes, Celery, , Ruby, Financial or ecological knowledge.
20 min call, then 1H30 technical interview.