SRE OpenStack

Résumé du poste
CDI
Wrocław
Salaire : Non spécifié
Compétences & expertises
Amélioration continue
Réponse aux incidents
Collaboration et travail d'équipe
OpenStack
Postuler

OVHcloud
OVHcloud

Cette offre vous tente ?

Postuler
Questions et réponses sur l'offre

Le poste

Descriptif du poste

As a Public Cloud SRE specializing in OpenStack, you will be the cornerstone in ensuring the reliability, performance, and scalability of our cloud infrastructure. You will be responsible for maintaining high service availability, implementing rigorous monitoring, and driving continuous improvements in our production environment.

Main Missions
• System Resilience: Architect and maintain key components of our OpenStack environment-including compute, networking, and storage-to guarantee high availability.
• Monitoring & Alerting: Implement and refine comprehensive monitoring, logging, and alerting systems to rapidly detect and address production issues.
• Incident Response: Take charge during outages or service degradations by leading incident management processes and coordinating with cross-functional teams.
• Performance Tuning: Analyze system and trends to optimize performance, ensuring the infrastructure scales.
• Continuous Improvement: Identify opportunities for process automation and system enhancements, integrating best practices and innovative solutions into daily operations.
• Documentation & Standards: Maintain detailed documentation of processes, incident responses, and system architecture to uphold transparency and continuous learning.

After 6 Months You Will
• Understand the Landscape: Develop a deep understanding of our OpenStack environment, internal processes, and operational workflows.
• Establish Metrics: Contribute to defining and refining key reliability metrics (SLIs, SLOs, and error budgets) tailored to our services.
• Engage in Incident Management: Begin taking ownership of incident responses and participate in root cause analyses with cross-functional teams.

After 1 Year You Will
• Drive Strategic Initiatives: Play an instrumental role in defining the long-term reliability roadmap, integrating new tools and practices to further stabilize our services.
• Lead Operational Excellence: Own major reliability projects from conception to implementation, ensuring that our systems meet or exceed performance and uptime targets.

Required Skills
• OpenStack Expertise: In-depth knowledge of OpenStack architecture and hands-on experience managing its core components (Neutro, Nova, Glance, Cinder, Keystone…).
• Complex Infrastructure Management: Hands-on experience in managing and optimizing complex IT infrastructures.
• Collaborative Mindset: Strong communication skills and the ability to work effectively within cross-functional and remote teams.
• SRE Methodologies: Proven expertise in applying SRE practices, including service level objectives (SLOs), error budgets, and incident management.
• Advanced Monitoring & Automation: Experience with modern monitoring, logging, and alerting systems as well as proficiency in automating repetitive tasks.
• Performance Tuning: Strong analytical skills to interpret system metrics and optimize infrastructure performance.
• Language Proficiency: Fluent in English

Cette offre ne répond pas tout à fait à vos attentes ? Candidatez malgré tout !
C’est l’occasion de partager votre profil avec nos recruteurs, vous faire remarquer et peut-être recontacter pour une autre opportunité.

Did this offer not quite meet your expectations? Submit a spontaneous application on our candidate portal to join one of our teams!
It’s a great opportunity to share your profile with our recruiters, get noticed, and potentially be contacted for a different opportunity.

Envie d’en savoir plus ?

Postuler