THE ROLE
Join Pure Storage organization as a Site Reliability Engineer, where you will be instrumental in ensuring the performance and stability of our mission-critical engineering infrastructure and production services across a global environment. You will be a hybrid software and systems expert, owning the reliability, operations, and development of core applications that power Pure’s product innovation. This role is a unique opportunity to drive efficiency and set the standard for operational excellence through automation, blameless postmortems, and setting ambitious SLOs alongside fellow engineers.
WHAT YOU'LL DO
- Establish and maintain impeccable service reliability for core cloud platforms and infrastructure by implementing robust monitoring, proactive incident response, and driving root cause analysis (RCA) and resolution for production issues in a 24x7 environment.
- Participate in the transformation of operational practices by identifying, designing, and implementing automation and orchestration solutions for manual cloud service operations and deployment, significantly enhancing efficiency and reducing human error.
- Partner cross-functionally with development teams to integrate SRE principles early in the development lifecycle, defining improvements to service architecture that bolster high availability, scalability, and adherence to established SLAs.
- Build and evolve the observability stack by setting up, configuring, and improving service health monitoring, collecting and reporting key metrics, and establishing effective alerting systems to maintain deep insight into system performance and health.
- Drive adoption of modern cloud operations technologies, exploring and integrating new tools for Infrastructure as Code (IaC), container orchestration, and high-availability (HA) to continuously optimize the reliability and scalability of our cloud offerings.
WHAT YOU BRING
- Demonstrated ability to write production-quality code using languages such as Python, Go, Java, C, or C++, including experience with software design, implementation, and maintenance.
- 3+ years of experience as SRE or DevOps to support globally distributed SaaS services
- Systematic and data-driven problem-solving approach, coupled with strong communication skills and a deep sense of ownership for critical production services.
- A solid understanding of Enterprise Systems performance analysis and debugging, with the ability to leverage metrics and data to drive system improvements.
- We are primarily an in-office environment and therefore, you will be expected to work from the Prague office in compliance with Pure’s policies, unless you are on PTO, or work travel, or other approved leave.