Platform Engineer - Observability & Performance

Permanent contract
Gland
Salary: Not specified
A few days at home
Apply

Swissquote
Swissquote

Interested in this job?

Apply
Questions and answers about the job

The position

Job description

You will join the IT Observability & Performance team at Swissquote, whose mission is to deliver situational awareness via telemetry, detection and forecasting.

This entails collaboration with cross-functional teams, as well as IT management, to collect actionable telemetry data, drive cost optimization through FinOps practices, and empower metric-driven decision-making. Your expertise will shape a proactive, agile, and high-performing IT environment, ensuring the reliability and efficiency of our financial systems.

As a Platform Engineer, you will design, implement and manage advanced telemetry solutions. Your expertise will help build towards our vision of a self-service platform.

You will also play a pivotal role in analyzing system performance, enabling root-cause analysis, and fostering continuous improvement in our IT infrastructure, all while aligning with Site Reliability Engineering (SRE) principles as outlined in the SRE Handbook.

  • Develop and deploy telemetry frameworks using tools like ELK Stack, Grafana, and Prometheus to monitor system performance, availability, and reliability.
  • Design and implement alerting mechanisms with tools like PagerDuty to enable rapid anomaly detection and response.
  • Analyze telemetry data to identify trends, performance bottlenecks, and potential issues, providing actionable insights.
  • Enable teams to perform root-cause analysis and proactively detect performance issues through layman dashboards to enhance system resilience.
  • Support IT management in automating and tracking Service Level Objectives (SLOs), Key Performance Indicators (KPIs), and error budgets in alignment with SRE principles.
  • Drive FinOps initiatives by optimizing observability-related costs for our internal cloud and implementing self-service metrics, logs, and traces.
  • Generate comprehensive reports for IT management on system health, incident trends, compliance requirements and regulatory needs.
  • Contribute to continuous improvement by recommending and implementing telemetry-driven enhancements to IT infrastructure.


Preferred experience

Minimum Qualifications

  • BS/MS in Computer Science, Engineering, or a related technical field involving programming (e.g., Physics, Mathematics), or equivalent experience.

  • Knowledge and hands-on experience with:

    • Infrastructure as Code and GitOPS principles, with tools like Github Actions, Ansible or Terraform
    • Observability tools, with tools like ELK, Grafana, Prometheus or OpenTelemetry
    • Alerting & on-call experience, with tools like Nagios, PagerDuty or incident.io
  • Strong knowledge of development, operations, networking, storage, or security.

  • Proficiency in at least one programming language such as Python, Go, Rust, Java, or Bash.

  • Systematic approach to problem-solving and a strong sense of ownership, accountability, and communication.

Preferred Qualifications

  • Experience deploying and managing observability solutions in Kubernetes, containerized environments, or standalone VMs.
  • Understanding of modern IT infrastructure (Kubernetes, containers, service mesh, standalone VMs).

  • Expertise in defining and implementing SLOs, KPIs, and error budgets following SRE principles.

  • Familiarity with FinOps practices and tools like OpenCost for cost optimization.

  • Proficiency with Infrastructure as Code (IaC) tools like Terraform or Ansible for maintaining observability infrastructure.

  • Ability to quickly learn and adopt emerging technologies, methodologies, and solutions

  • Knowledge of distributed tracing tools (e.g., APM, OpenTelemetry, Jaeger, Zipkin) and their application in complex architectures.

Want to know more?

These job openings might interest you!

These companies are also recruiting for the position of “Network Engineering and Administration”.

  • Swissquote

    Senior Network Engineer

    Swissquote
    Swissquote
    Permanent contract
    Gland
    A few days at home
    Software, FinTech / InsurTech
    1,000 employees

Apply