Site Reliability Engineer in City of London, City Of London

Site Reliability Engineer in City of London

New Yesterday

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.

We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.

Job Description

Role: SRE

Skills: Deep Linux, Scripting - Python, DevOps, Kubernetes

Salary: £500k Plus

Location: London

The ideal candidate comes from a top-tier tech environment (FAANG, elite trading, hyperscale infra). They have experience building technology 0→1, owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks.

Overview

Join a core engineering group as Lead Site Reliability Engineer, designing and scaling Linux platforms that underpin ML/AI-driven trading. You will architect and own reliability for massive simulation, HPC, and production workloads—ensuring ultra-reliable, ultra-fast trading systems. This is a hands-on, leadership role focused equally on technical depth, strategic decision-making, and driving platform SRE excellence.

Key Responsibilities

Lead SRE practices for Linux platforms powering low-latency, high-throughput trading workloads. Architect, optimize, and tune Linux for performance, resilience, and minimal latency. Drive incident response, root cause analysis, and continuous reliability improvement across production systems. Oversee system automation and reproducibility—build, deploy, and fleet-manage bare-metal Linux and containerized stacks. Manage and enhance Kubernetes clusters, network configuration, and large-scale orchestration. Set observability standards; expand monitoring, alerting, and performance metrics across platforms. Analyze networking, kernel-level performance, and distributed systems—solving core challenges in a multi-petabyte, multi-cluster environment. Build Python tools for automation, reliability engineering, and performance analysis. Design highly distributed systems

What You Will Work On

Ultra-reliable, high-performance trading infrastructure where every engineering optimization affects performance Next- simulation and HPC compute pipelines, supporting ML/AI workflows at scale. Integration and continuous improvement of internal and open-source tools for automation and reliability. Strategic platform direction: shaping foundational systems for critical infrastructure in an elite trading environment.

Team and Culture

Small, autonomous Linux SRE team with direct ownership and impact. Collaborative engagement with quants, researchers, and trading experts to deliver robust platforms. A culture built on deep technical ownership, learning, and high standards of performance engineering

Apply now for an informal confidential chat!

If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.

Apply

Location:: City Of London
Job Type:: FullTime
Category:: Engineer, Reliability Engineer, Reliability, Engineering, Site

Start a New Search