THE ROLE
Pure Storage Cloud is entering its next phase—building the foundation for global scale as we expand our cloud platform capabilities. We’re seeking a seasoned leader with deep domain expertise to advance automation, service reliability, and operational excellence, reflecting our strong vision to provide a multi-cloud experience. This is a highly visible role with broad influence and exposure to senior stakeholders across engineering and product organizations.
You’ll lead the evolution of our core systems and engineering practices, combining operational depth with strategic scope—spanning SLIs/SLOs, incident and change management, internal developer experience, or resilience across the stack. If you’re ready to help shape how Pure engineers build and operate cloud services—with measurable impact and autonomy—join us in Prague to lead one of Pure’s most critical cloud initiatives.
This role is based in Prague, Czech Republic. Not local? We provide a competitive relocation package.
WHAT YOU’LL DO
- Lead and develop both the SRE and Platform teams, setting the strategy and execution for reliability, scalability, and operability across Pure Storage Cloud.
- Own reliability engineering—define and evolve SLIs/SLOs, error budgets, and operational excellence (on‑call, incident response, change management, runbooks).
- Build and run the internal platform: modern developer lifecycle tooling (CI/CD guardrails, observability/telemetry, automation, incident tooling) that accelerates feature teams.
- Operate and harden the service’s core cloud infrastructure (Kubernetes, IaC) across control and data planes; lead capacity planning, cost optimization, disaster recovery, and multi‑region readiness.
- Champion incident management and continuous improvement—calm responses, blameless postmortems with durable actions, and systematic toil/MTTR reduction.
WHAT YOU BRING
- Proven leadership running SRE/Production Engineering and Platform functions for SaaS or cloud services at scale, building high‑performance, inclusive teams.
- Hands-on software development experience and fluency in engineering fundamentals (design/reviews, automated testing, CI/CD, version control) with an ability to contribute to production‑grade code.
- Deep SRE foundations: SLI/SLO and error budgets, incident management, capacity planning, change/release management, and reliability reviews.
- Practical cloud expertise—Azure preferred—plus modern SRE toolchain: containers/Kubernetes, IaC (Terraform/Bicep/CloudFormation), CI/CD, and observability (OpenTelemetry, Prometheus/Grafana, ELK, Azure Monitor).
- Strong systems thinking and architectural acumen (resilience reviews, failure‑mode analysis, chaos/DR testing) with crisp, data‑driven stakeholder communication.
We are primarily an in-office environment and therefore, you will be expected to work from the Prague office in compliance with Pure’s policies, unless you are on PTO, or work travel, or other approved leave.
#LI-ONSITE