This position is no longer available.

Site Reliability Engineer (Devops), Instances

Permanent contract
Paris
Salary: Not specified
Fully-remote
Experience: > 2 years

Scaleway
Scaleway

Interested in this job?

Questions and answers about the job

The position

Job description

Context of the position
We are looking for a Site Reliability Engineer to join our Instances team. Your main mission will be to ensure we can reliably serve virtual machines for users around the world. We expect you to have a strong background in Python development and system administration, along with some DevOps practice experience. Our systems evolve constantly and the tools needed to observe and act to ensure their resilience need to evolve accordingly. You will need to be a problem solver that is willing to collaborate, and who knows how to leverage knowledge of system interactions in his favour. Are you ready to look after our virtualisation system and strive to improve our users daily life? This is a unique opportunity to join Scaleway and ensure developers of any companies get the high-quality virtual instance service they need.

What you’ll be doing

  • Create or optimise existing tools & documentation that will help identify, diagnose and remediate production incidents, automating as much as possible
  • Troubleshoot high-impact issues working with multiple engineering teams (Storage, Network, Hardware)
  • Take on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customers
  • Ensure a high quality of service for our customers by leveraging observability and monitoring technologies
  • Manage lifecycle of hypervisors in production and take part to fleet-wide migration plan
  • Empower your team mates to swiftly integrate and deploy software components of our virtualisation system
  • Help implementing best practices in stability, resiliency, scalability, security and performance across our virtualisation system

What we expect from you

  • Experience in system programming with python, bash, …
  • Demonstrated ability to troubleshoot production systems failures
  • A great attitude and desire to work with a team
  • Passion for incremental improvements on tooling, love all things of automation
  • Experience with Linux systems: Ubuntu server
  • Experience with virtualization: qemu/kvm
  • Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP and network virtualisation

Nice to have

  • Experience with infrastructure as code and continuous deployment
  • Experience dealing with physical hardware automation
  • Experience with monitoring & logging systems
  • Experience administering relational databases
  • Knowledge of one cloud platform and related use-cases
  • Experience as an OSS contributor or maintainer

Technical stack & tools we use

  • Python
  • RabbitMQ + Celery
  • PostgreSQL + SQLAlchemy
  • HA Proxy, Nginx, REST APIs / Flask
  • S3 API
  • Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
  • Ansible, AWX, Foreman
  • GitLab, Nexus
  • Ubuntu, Debian, CentOS
  • Jira, Confluence, Slack, GSuite

You recognize yourself by reading these lines and you want to join a young, innovative, growing company where it is good to work ?

Then don’t wait any longer and join us :)

This position can be based in Paris, Lille or full-time remote

Want to know more?

These job openings might interest you!

These companies are also recruiting for the position of “Cloud Computing and DevOps”.

See all job openings