Site Reliability Engineer - SRE Lille

Resumen del puesto
Indefinido
Lille
Salario: No especificado
Unos días en casa
Competencias y conocimientos
Centos
Kibana
Gitlab
Debian
Nginx
+15

Scaleway
Scaleway

¿Te interesa esta oferta?

jobs.faq.title

El puesto

Descripción del puesto

About the job

Scaleway is looking for a Site Reliability Engineer to join our teams.

Reporting to a Lead SRE, you will be responsible to ensure we can reliably serve our products for users around the world. We expect you to have a strong background in development and system administration. Our systems evolve constantly and the tools needed to observe and act to ensure their resilience need to evolve accordingly.

Minimum qualifications

  • Previous experience as a developer in Go, Python or Rust
  • Experience in system programming with usual scripting languages (bash, Python)
  • Demonstrated ability to troubleshoot production systems failures
  • A great attitude and desire to work with a team
  • Passion for incremental improvements on tooling, love all things of automation
  • Experience with Linux systems (Ubuntu/Debian)
  • Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)
  • Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP and network virtualisation
  • Understanding of written and spoken english, capable of writing technical documentation in English, ability to speak english if needed
  • Preferred qualifications

  • Experience with infrastructure as code and continuous deployment
  • Experience dealing with physical hardware automation
  • Experience with monitoring & logging systems
  • Experience administering relational databases
  • Knowledge of one cloud platform and related use-cases
  • Take initiatives to propose new solutions and defend them
  • Team player, willing to share knowledge, opinions, and participate in regular team rituals
  • Good communication skills and coaching skills
  • Responsibilities

  • Create or optimize existing tools & documentation that will help identify, diagnose and remediate production incidents, automating as much as possible
  • Troubleshoot high-impact issues working with multiple engineering teams
  • Take on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customers
  • Ensure a high quality of service for our customers by leveraging observability and monitoring technologies
  • Manage lifecycle of products in production 
  • Help implementing best practices in stability, resiliency, scalability, security and performance across our systems
  • Technical Stack

  • Python, Go, Rust
  • RabbitMQ
  • PostgreSQL 
  • HA Proxy, Nginx, REST APIs / Flask
  • S3 API
  • Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
  • Ansible, AWX, Foreman, Salt
  • GitLab, Nexus
  • Ubuntu, Debian, CentOS
  • Jira, Confluence, Slack, GSuite
  • Location

    This position is based in our offices in Paris or Lille (France)

    ¿Quieres saber más?