Platform Engineer - Observability & Performance

CDI
Gland
Salaire : Non spécifié
Télétravail fréquent
Postuler

Swissquote
Swissquote

Cette offre vous tente ?

Postuler
Questions et réponses sur l'offre

Le poste

Descriptif du poste

You will join the IT Observability & Performance team at Swissquote, whose mission is to deliver situational awareness via telemetry, detection and forecasting.

This entails collaboration with cross-functional teams, as well as IT management, to collect actionable telemetry data, drive cost optimization through FinOps practices, and empower metric-driven decision-making. Your expertise will shape a proactive, agile, and high-performing IT environment, ensuring the reliability and efficiency of our financial systems.

As a Platform Engineer, you will design, implement and manage advanced telemetry solutions. Your expertise will help build towards our vision of a self-service platform.

You will also play a pivotal role in analyzing system performance, enabling root-cause analysis, and fostering continuous improvement in our IT infrastructure, all while aligning with Site Reliability Engineering (SRE) principles as outlined in the SRE Handbook.

  • Develop and deploy telemetry frameworks using tools like ELK Stack, Grafana, and Prometheus to monitor system performance, availability, and reliability.
  • Design and implement alerting mechanisms with tools like PagerDuty to enable rapid anomaly detection and response.
  • Analyze telemetry data to identify trends, performance bottlenecks, and potential issues, providing actionable insights.
  • Enable teams to perform root-cause analysis and proactively detect performance issues through layman dashboards to enhance system resilience.
  • Support IT management in automating and tracking Service Level Objectives (SLOs), Key Performance Indicators (KPIs), and error budgets in alignment with SRE principles.
  • Drive FinOps initiatives by optimizing observability-related costs for our internal cloud and implementing self-service metrics, logs, and traces.
  • Generate comprehensive reports for IT management on system health, incident trends, compliance requirements and regulatory needs.
  • Contribute to continuous improvement by recommending and implementing telemetry-driven enhancements to IT infrastructure.


Profil recherché

Minimum Qualifications

  • BS/MS in Computer Science, Engineering, or a related technical field involving programming (e.g., Physics, Mathematics), or equivalent experience.

  • Knowledge and hands-on experience with:

    • Infrastructure as Code and GitOPS principles, with tools like Github Actions, Ansible or Terraform
    • Observability tools, with tools like ELK, Grafana, Prometheus or OpenTelemetry
    • Alerting & on-call experience, with tools like Nagios, PagerDuty or incident.io
  • Strong knowledge of development, operations, networking, storage, or security.

  • Proficiency in at least one programming language such as Python, Go, Rust, Java, or Bash.

  • Systematic approach to problem-solving and a strong sense of ownership, accountability, and communication.

Preferred Qualifications

  • Experience deploying and managing observability solutions in Kubernetes, containerized environments, or standalone VMs.
  • Understanding of modern IT infrastructure (Kubernetes, containers, service mesh, standalone VMs).

  • Expertise in defining and implementing SLOs, KPIs, and error budgets following SRE principles.

  • Familiarity with FinOps practices and tools like OpenCost for cost optimization.

  • Proficiency with Infrastructure as Code (IaC) tools like Terraform or Ansible for maintaining observability infrastructure.

  • Ability to quickly learn and adopt emerging technologies, methodologies, and solutions

  • Knowledge of distributed tracing tools (e.g., APM, OpenTelemetry, Jaeger, Zipkin) and their application in complex architectures.

Envie d’en savoir plus ?

D’autres offres vous correspondent !

Ces entreprises recrutent aussi au poste de “Network Engineering and Administration”.

  • Swissquote

    Senior Security Engineer

    Swissquote
    Swissquote
    CDI
    Gland
    Télétravail fréquent
    Logiciels, FinTech / InsurTech
    1 000 collaborateurs

Postuler