Senior Machine Learning Engineer
This position was filled!
Who are they?
GitGuardian is a global post-series B cybersecurity startup; we’ve raised $44M by the end of 2021 with American and European investors including top-tier VC firms.
More than ever in 2023, we have a very solid business model with a fast-growing ARR, multi-year contracts and great customer retention rates.
Among our early investors who saw our market value proposition, are the co-founder of GitHub, Scott Chacon, along with Docker co-founder / CTO Solomon Hykes 👀
We develop code security solutions for the DevOps generation and are a leader in the market of secrets detection & remediation.
Our solutions are already used by hundreds of thousands of developers in all industries and GitGuardian Internal monitoring is the n°1 security app on the GitHub marketplace 🔥
We work with some of the largest IT outsourcing companies, publicly listed companies like Talend or tech companies like Datadog.
More than 85% of our customers are in the United States.
Rencontrez Edouard, VP Product
Job description
Context
Our products are a set of tools that scan GitHub public activity and git private repositories for security vulnerabilities.
They are used by different teams: Software Development and Ops teams, Application Security, Threat Response and the buying decision comes from CISOs / CTOs / Directors of Security.
By design GitGuardian is a data driven company. Both co-founders are former Data Scientists and the first product of GitGuardian is real-time processing of all new GitHub events. Our secret detection engine has been battle tested against huge amounts of data.
In this context, GitGuardian now wants to take it to the next level by incorporating Machine Learning models to create better vulnerability detectors and also improve internal performance efficiency. That’s why your work will matter and will be taken seriously !
Missions
As a Machine Learning Engineer, you will have to:
Lead the end-to-end development of scalable and reliable Machine Learning models that can be used to solve business problems. For instance, build a new generation of secret detectors using state-of-the-art LLMs.
Identify areas in the company where Machine Learning can be applied. In particular, launch ML experiments to bring new in-app features to our existing products, like incident severity classification.
Deploy tools to monitor the quality and performance of the models, while ensuring that they meet the business requirements
Work closely with the Data Engineering team to ensure smooth data integration
Collaborate with DevOps and Software Engineering teams to deploy models into our products, monitor their performance, and troubleshoot issues as needed
Stay up-to-date with the latest advancements in NLP and ML technologies to implement new techniques into the existing models and foster a culture of innovation and continuous learning
Communicate about cutting-edge ML applications at GitGuardian by writing blog posts, participating in meetups
Advantages
You will build and deploy state-of-the-art models that bring high value to the business
You will be able to leverage a huge amount of textual data collected from day 1
The ML tooling landscape is still to be defined
You will be part of a scale-up adventure with a strong engineering culture
Our technical stack
Snowflake
PostgreSQL, Elasticsearch, MongoDB
Airbyte
Metabase, Tableau
GitLab
AWS, Terraform, Docker, Kubernetes
Preferred experience
_If you think you are only matching 70% to 80% of these criterias, please send us your resume !
And if you still have some questions before applying, you can directly write to us at :_
Hard skills
5+ years of hands-on experience in building, deploying and maintaining ML models with concrete business applications
Solid analytical and advanced statistical skills
Deep knowledge of state-of-the-art NLP techniques and models, especially LLMs
Strong programming skills in one or more programming languages focused on data processing (Python, Scala, etc.) along with skills in application best practices (code modularity, unit tests, documentation, etc.)
Fluent in ML libraries such as PyTorch, Transformers, spaCy, scikit-learn
Strong experience in packaging and delivering ML models in production using cloud-based platforms
Experience with Docker and MLOps tools (Airflow, MLFlow)
Experience in using Hugging Face and transformers is a plus
Experience in Data Warehousing (Snowflake, BigQuery) and data app prototyping (Streamlit, Dash) is a plus
Soft skills
You like algorithms and new technology
You like to write high quality and re-usable code
You are used to perform applied research projects and bring them to production
You are autonomous, proactive and curious
You are a team player with strong communication skills. In particular, you should be able to work with cross-functional teams, and be able to communicate technical concepts to non-technical stakeholders.
You are able to work in a fast-paced and dynamic environment, and adapt to changing requirements
You speak fluent French and English
Bonus points
You don’t embed API keys in your code ;)
Deep understanding of the startups dynamics and challenges
Have experienced strong team growth in a previous company
Recruitment process
1 visio call with a recruiter
To discover your professional project, present to you the team, and evaluate if there could be a mutual match
1 technical team interview
To evaluate your hard skills for the position and project yourself into the role
1 technical test depending on your seniority
To see how you are doing hands on coding
1 final interview with the CEO and co-founder
To explain to you our company’s vision and ambitions to the next couple of years, and make sure you are up for the position