Cette offre n’est plus disponible.

Lead SRE Engineer

CDI
Paris
Salaire : Non spécifié
Télétravail total
Expérience : > 5 ans

Stuart
Stuart

Cette offre vous tente ?

jobs.faq.title

Le poste

Descriptif du poste

We are looking for a Lead Site Reliability Engineer who will be a technical leader for our SRE team. You will guide the team technically and help us make our platform more robust, handle failures gracefully, and early detect issues by the mean of automation, proper alarming, and chaos engineering.

🚀 The SRE mission is to make the platform as reliable as possible, trying to reduce the number and severity of incidents affecting the platform. We need to make sure that all the services are efficiently monitored with the right thresholds set for alarms to be meaningful, and that most of the remediation work is automated rather than manual. Further reliability of the platform is provided by introducing controlled errors in it (chaos engineering principles) and testing different disaster recovery scenarios. SREs are the stewards of reliability and they provide the technical and documentation instruments for other Engineering teams to build reliable software.

🤝 The SRE team is a new team at Stuart and you will have the opportunity to see how the team grows further, and have a word in how it does it. You will be part of the Infrastructure department under the Reliability area, together with the Engineering Support team. Other areas of the department are Cloud Engineering, Security, and IT.

What will I be doing? 🤓

  • Be a technical leader for the team and the go to person for software reliability matters.
  • Take part in additional departmental efforts such as hiring, running community talks, defining team processes and other such ways to contribute to culture and growth on the team.
  • Help the other engineering teams to build reliable, observable, and performant products.
  • Drive and help other teams to set SLOs and SLAs and track them via SLIs.
  • Lead Design the Stuart observability stack, implement it and guide other teams to adopt it.
  • Contribute to Stuart systems reliability and performance.
  • Write playbooks for alarms, and then automate them so manual intervention is not required.
  • Document knowledge and practices in a clear way, so other departments can benefit from it.
  • Collaborate with the Engineering Support team on incident management.
  • Conduct and lead post-mortem meetings; follow-up on the action items.
  • Lead the way towards the chaos engineering path.
  • What do we need from you? 😎

  • 5+ years of experience in a similar position (even if with a different title) in an always-up, always-available mission-critical service.
  • You come from a Systems or a Software Engineering background, we will like you exactly the same!
  • Love for automation: you don’t want to repeat the same job twice.
  • Proven record leading complex projects from start to end.
  • You are the go-to person in your team if there are difficult technical problems to solve.
  • You have written programs to automate tasks, reducing toll.
  • You feel comfortable doing low-level Linux and networking debugging.
  • Worked with complex Terraform code-bases. Bonus point if you wrote a provider.
  • Very good cloud environments and Kubernetes knowledge (we use AWS & EKS).
  • Working experience with chaos engineering practices.
  • You like teaching and pass best-practices to others, and write thorough documentation.
  • Proactive mindset: if you see something is not working, you start the process to fix it.
  • Both written and spoken fluency in English.
  • Don’t worry, we don’t expect you to tick every single item here! But it should give you a feeling of what kind of experience we are looking for.

    The stuff you wanna know 😉

  • Family-friendly work-life balance - work from home and flexible hours 🏡
  • Option to work remotely anywhere in Italy 🇮🇹
  • Permanent full time contract
  • Meal Vouchers 🥗
  • Unlimited access to Udemy for all your learning and development needs 📚
  • Stuart Academy with regular workshops, Stu-Classes, and Stu-Talks 🎓
  • Stuart is putting Mental Health Awareness first! 40 EUR Wellness Allowance to use in any gym or sport class 🧘
  • Private healthcare insurance 🧑‍⚕️
  • 2 volunteer days per year to have a positive impact on our communities and the environment
  • Stability - Stuart is part of a solid and successful state-owned Group (Geopost) in France 🚀
  • Work in an international, dynamic and passionate environment with a company culture focused on learning and development 🎉
  • At Stuart, we believe that employees today want to evolve in collaborative, high-growth environments where they can demonstrate their abilities and thrive both professionally and personally. We are convinced that employees need to find alignment between their inner values and their company’s culture and mission to unlock their full potential. We work to create a culture of empowerment, continuous learning and growth where everyone can bring expertise, own projects and easily measure their impact 🙌

    Stuart is proud to be an equal opportunity workplace dedicated to promoting diversity. We don’t discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status or disability status 💙

    Please note: Our Talent Acquisition Team is international coming from across the world 🌍 We kindly ask you to please submit your CV and application in English so that it can be reviewed correctly (unless the job posting is in a language other than English). Thank you 🤗

    Want to learn more about us? Visit https://stuart.com/about-us/ 

    Envie d’en savoir plus ?

    D’autres offres vous correspondent !

    Ces entreprises recrutent aussi au poste de “Cloud Computing and DevOps”.