This Job position is no longer available

We encourage you to browse other open positions on our website.

Thank you for your interest!

Site Reliability Engineer in London

11 Days Old

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.
We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.
Job Description Site Reliability Engineer \n Contract - 12 months \n Inside IR35 \n Hybrid working \n £400-550 per day depending on experience \n\n Job Description My client is looking for a skilled Senior Site Reliability Engineer to play a key role in improving the reliability, scalability, and operational performance of their production systems. This role works closely with product and engineering teams to enhance system reliability, architecture, deployment safety, and observability.\n\n Role Summary \nMy client is seeking a Senior Site Reliability Engineer to join a centralized Technical Operations function, where you will lead reliability initiatives and support operations across a range of large-scale, customer-facing digital services.\nOperating within a centralized SRE model, you will partner with product and engineering teams while maintaining shared responsibility for production reliability, resilience, and scalability. The role includes participation in an on-call rotation supporting critical services, with shared ownership of overall system health.\nYou will be responsible for defining reliability standards, influencing architectural improvements, managing complex incidents, and building automation to improve deployment safety and operational efficiency. Your work will directly support high-traffic systems used by a global audience.\n\n Key Responsibilities \n Reliability & Risk Engineering \nMy client is looking for someone who can:\n
\n \nIdentify systemic reliability risks and drive long-term preventative improvements\n
\n
\nDefine and refine SLIs, SLOs, and error budgets aligned with business and customer outcomes\n
\n
\nLead complex incident management, post-incident reviews, and remediation planning\n
\n
Depth at Networkign Fundamentals - trouble shoting network infrastructure is key
\n
Experiecne working as senrio SRE particularly around AWS
\n
\n Architecture & Resilience \nYou will:\n
\n \nReview and influence system architecture to improve scalability, availability, and fault isolation\n
\n
\nDesign strategies for high availability, graceful degradation, and disaster recovery\n
\n
\nEvaluate trade-offs between performance, cost, and operational risk\n
\n
\n CI/CD & Deployment Safety \nThe successful candidate will:\n
\n \nImprove deployment pipelines and implement automation to reduce risk and accelerate delivery\n
\n
\nImplement safe deployment strategies such as canary releases and blue/green deployments\n
\n
\nEnsure strong rollback and recovery mechanisms\n
\n
\n Observability & Performance \nYou will be expected to:\n
\n \nBuild and enhance observability solutions including metrics, logging, and tracing\n
\n
\nWork with teams to reduce alert fatigue and improve signal quality\n
\n
\nDiagnose performance bottlenecks across infrastructure and applications\n
\n
\n Infrastructure & Automation \nMy client is seeking someone who can:\n
\n \nDesign and operate cloud-, containerised workloads at scale\n
\n
\nUse Infrastructure as Code to build and manage resilient platforms\n
\n
\nDevelop automation to reduce manual effort and operational risk\n
\n
\n Cross-Functional Leadership \nYou will:\n
\n \nMentor engineers and promote SRE best practices across teams\n
\n
\nCollaborate with engineering, product, and security stakeholders to improve system reliability\n
\n
\n\n Required Qualifications \nMy client is looking for candidates with:\n
\n \nA degree in Computer Science, Engineering, or equivalent practical experience\n
\n
\nStrong experience designing and operating CI/CD systems with deployment safety practices\n
\n
\nExcellent communication skills with the ability to influence cross-functional teams\n
\n
\n7+ years of experience in SRE, production engineering, or systems engineering roles\n
\n
\nStrong knowledge of distributed systems concepts, including consistency and failure handling\n
\n
\nHands-on experience with major cloud platforms (e.g., AWS, GCP, Azure), including multi-region environments\n
\n
\nStrong experience with Kubernetes and container orchestration at scale\n
\n
\nProficiency in at least one programming such as Go, Python, or Java\n
\n
\nProven experience managing high-severity incidents and leading remediation efforts\n
\n
\n\n Qualifications \nIdeally, candidates will also have:\n
\n \nExperience with multi-region or multi-cloud architectures\n
\n
\nFamiliarity with observability tools such as Prometheus, Grafana, or Datadog\n
\n
\nPrevious mentoring or technical leadership experience\n
\n
\nExperience with Infrastructure as Code tools such as Terraform or CloudFormation\n
\n
\nExposure to AI-assisted tooling for incident analysis or operational efficiency\n
\n
\n\n
Sphere Digital Recruitment is acting as an Employment Business in relation to this vacancy.
If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.
Location:
London
Job Type:
FullTime

We found some similar jobs based on your search