Site Reliability Engineer (SRE)

Résumé du poste
CDI
Paris
Salaire : 45K à 55K €
Début : 30 juin 2025
Télétravail total
Expérience : > 2 ans
Éducation : Bac +3
Compétences & expertises
Contenu généré
Outils d'automatisation
Suivi des performances
Aptitude à résoudre les problèmes
Gestion de l’infrastructure cloud
Compétences en communication
+15

Popsink
Popsink

Cette offre vous tente ?

Questions et réponses sur l'offre

Le poste

Descriptif du poste

About Popsink

Popsink is a cutting-edge data transfer solution revolutionizing how organizations handle and move their data. Our mission is to provide seamless, secure, and efficient data transfer capabilities for businesses of all sizes. As a fast-growing startup, we are seeking a passionate and experienced Site Reliability Engineer (SRE) to join our fully remote team and help us build a highly reliable, scalable, and efficient infrastructure.


Role Overview

As an SRE at Popsink, you will play a critical role in ensuring the reliability, scalability, and security of our infrastructure. You will collaborate with developers, product teams, and other engineers to design and implement robust systems and processes that power our stack, which includes Google Cloud Platform (GCP), Kubernetes, ArgoCD, and Terraform. Additionally, you will drive our monitoring and tracing strategies to ensure deep visibility into system health and performance.


Profil recherché

Key Responsibilities

  • Infrastructure Management:

    • Design, build, and manage cloud infrastructure on Google Cloud Platform (GCP).

    • Automate infrastructure provisioning and deployments using Terraform.

  • Orchestration & Automation:

    • Manage and optimize Kubernetes clusters for containerized application deployment and scaling.

    • Implement GitOps workflows using ArgoCD to ensure seamless application updates.

  • Monitoring, Tracing, & Performance:

    • Develop and maintain comprehensive monitoring and tracing solutions to track system health and performance.

    • Configure and utilize tools like Prometheus, Grafana, Jaeger, or similar systems for observability.

    • Proactively identify bottlenecks and optimize system performance based on metrics and logs.

  • Reliability Engineering:

    • Define and maintain SLOs, SLAs, and SLIs to ensure system reliability.

    • Lead post-incident reviews and implement preventive measures to enhance system resilience.

  • Collaboration:

    • Partner with development teams to implement CI/CD pipelines and enforce best practices.

    • Foster a culture of operational excellence, automation, and continuous improvement across the team.


Required Qualifications

  • Technical Expertise:

    • Hands-on experience with Google Cloud Platform (GCP) and its services (e.g., Compute Engine, GKE, Cloud Storage).

    • Proficiency in managing Kubernetes clusters for orchestration and scaling.

    • Strong knowledge of Terraform for infrastructure as code.

    • Familiarity with GitOps tools like ArgoCD.

  • Monitoring & Observability:

    • Experience implementing and managing monitoring and tracing systems (e.g., Prometheus, Grafana, Jaeger, or OpenTelemetry).

    • Deep understanding of observability principles and best practices.

  • Problem Solving:

    • Proven ability to troubleshoot complex distributed systems in production environments.

    • Experience with incident management and root cause analysis processes.

  • Programming & Automation:

    • Proficiency in one or more programming/scripting languages (e.g., Python, Go, Bash).
  • Soft Skills:

    • Strong communication and collaboration skills, with a proactive mindset.

    • Comfort working in a fast-paced startup environment.


Preferred Qualifications

  • Certification in GCP or Kubernetes (e.g., Google Cloud Professional DevOps Engineer, CKA).

  • Experience with service meshes like Istio or Linkerd.

  • Familiarity with CI/CD tools like GitLab CI, Jenkins, or equivalent.

  • Knowledge of database systems and caching technologies (e.g., PostgreSQL, Redis).


Why Join Popsink?

  • Impact: Be part of a startup revolutionizing data transfer solutions.

  • Growth: Join a fast-paced environment with ample opportunities for career development.

  • Culture: Work with a collaborative, innovative, and supportive team.

Flexibility: Enjoy a fully remote work environment that supports work-life balance.


Déroulement des entretiens

  • 15min phone call

  • 1h technical interview

Envie d’en savoir plus ?

D’autres offres vous correspondent !

Ces entreprises recrutent aussi au poste de “Cloud computing et DevOps”.

Voir toutes les offres