Site Reliability Engineer (Devops), Instances SRE

  • CDI 
  • Paris
  • > 2 ans

La tribu



  • IT / Digital, SaaS / Cloud Services
  • De 250 à 2000 salariés

Le poste

Site Reliability Engineer (Devops), Instances SRE

  • CDI 
  • Paris


Powered by talented and passionate people working hard on democratizing the cloud, Scaleway, the 2nd leading European infrastructure cloud provider, is a multicultural company, rapidly growing into a global brand. We are present in 160 countries, with more than 280 employees of 18 nationalities.
We are a cloud computing pioneer delivering the innovative capabilities of modern multi cloud, covering a full spectrum of services for professionals: public cloud services with Scaleway Elements, private infrastructures and colocation with Scaleway Datacenter and bare Metal infrastructures with Scaleway Dedibox.

We place people at the heart of our purpose as an enabler of the internet. Our organization encourages responsibility, autonomy, commitment and thought leadership from our collaborators. Our premises are open spaces, conducive to exchange and interaction between individuals.

We believe it is our responsibility to be a positive force in society and to collectively design new systems for a better future. We want to increase access to the digital and technology industry. As our business scales, the customers we serve are increasingly diverse and global. Giving them an unbeatable experience is central to our business strategy. To better understand our customers and partners, we need a workforce that’s as diverse as they are.

Job description

Context of the position

We are looking for a Site Reliability Engineer to join our Instances team. Your main mission will be to ensure we can reliably serve virtual machines for users around the world. We expect you to have a strong background in system administration, along with some DevOps practice experience. Our systems evolve all the time, issues can pop repeatedly and differ very much from one another. You will need to be a resilient problem solver that is willing to collaborate, and that knows how to leverage knowledge of system interactions in his favour. Are you ready to look after our virtualisation system and strive to improve our users daily life? This is a unique opportunity to join Scaleway and ensure developers of any companies get the high-quality virtual instance service they need.

What you’ll be doing

  • Troubleshoot high-impact issues working with multiple engineering teams (Storage, Network, Hardware)
  • Take on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customers
  • Optimise on-call processes, tools & documentation that will help identify, diagnose and remediate production incidents, automating as much as possible
  • Ensure a high quality of service for our customers by leveraging observability and monitoring technologies
  • Manage lifecycle of hypervisors in production and take part to fleet-wide migration plan
  • Empower your team mates to swiftly integrate and deploy software components of our virtualisation system
  • Bootstrap new regions and availability zones collaborating with Platform & Network engineering team
  • Help implementing best practices in stability, resiliency, scalability, security and performance across our virtualisation system

Technical stack & tools we use

  • Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana, Icinga
  • Python
  • RabbitMQ + Celery
  • PostgreSQL + SQLAlchemy
  • HA Proxy, Nginx, REST APIs / Flask
  • S3 API
  • Ansible, AWX, Foreman, Saltstack
  • GitLab, Nexus
  • Ubuntu, Debian, CentOS
  • Jira, Confluence, Slack, GSuite

What we expect from you

  • 2+ years of system administration experience including significant use of devops toolset
  • A great attitude and desire to work with a team
  • Ability to make independent decisions, taking ownership for them
  • Demonstrated ability to troubleshoot production systems failures
  • Passion for incremental improvements on tooling, love all things of automation
  • Experience scripting with bash and Python
  • Experience with Linux systems: Ubuntu server, qemu/kvm
  • Experience with infrastructure as code and continuous deployment
  • Understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP and network virtualisation

Nice to have

  • Experience dealing with physical hardware automation
  • Experience with monitoring & logging systems
  • Experience administering relational databases
  • Experience with Python programming
  • Knowledge of one cloud platform and related use-cases
  • Experience as an OSS contributor or maintainer

You recognize yourself by reading these lines and you want to join a young, innovative, growing company where it is good to work ?

Then don’t wait any longer and join us :)

Meet the Scaleway team



Site Reliability Engineer (Devops), Instances SRE

  • Permanent contract 
  • Paris
  • > 2 years
Questions and answers about the offer
  • Ajouter aux favoris
  • Partager sur Twitter
  • Partager sur Facebook
  • Partager sur Linkedin

Notre sélection d'articles pour vous

Inspirez-vous avec une sélection d'articles