Join our Core Reliability & Observability team as a Staff Site Reliability Engineer. In this pivotal role, you will shape our observability strategy and ensure our platform remains reliable, debuggable, and scalable. You will lead the observability strategy, identify and lead large-scale reliability initiatives, and serve as a mentor and technical coach to senior engineers. You should have extensive experience in SRE, platform engineering, or infrastructure roles within cloud-native environments, and deep expertise in observability tooling and architecture.
Suggested summary by Welcome to the Jungle
Lead the observability strategy across the platform, focusing on building scalable, developer-friendly logging and tracing capabilities.
Identify and lead large-scale cross-cutting reliability initiatives, including improvements to incident detection, response, and postmortem analysis capabilities.
Serve as a mentor and technical coach to senior engineers, helping elevate the craft of reliability engineering across the company.
As a Staff Site Reliability Engineer within the Core Reliability & Observability team, you will play a pivotal role in shaping the company’s observability strategy and ensuring our platform remains reliable, debuggable, and scalable. This role sits at the intersection of infrastructure, developer experience, and product engineering, with a particular focus on building and evolving the foundations of logging, metrics, tracing, and alerting across the organization.
You’ll act as a technical leader and strategic partner to SREs, software engineers, and product teams, guiding decisions, mentoring engineers, and driving cross-cutting initiatives that elevate our operational maturity.
If you don’t meet all the requirements below but believe this opportunity matches your expectations and experience, we still encourage you to apply!
These companies are also recruiting for the position of “Cloud Computing and DevOps”.