Lead/Principal DevOps Engineer

New Today

Job Description

About the role

We are looking for a Lead/Principal DevOps Engineer to lead platform and infrastructure engineering. You will own and evolve the core infrastructure that underpins our cloud‑native, AI‑enabled SaaS platforms.

Mission

  • Build and operate secure, scalable, highly available cloud infrastructure
  • Enable product teams through automation, self‑service, and clear standards
  • Raise the bar on reliability, security, observability, and deployment quality
  • Act as a technical leader across platform and infrastructure initiatives

What success looks like

You will be accountable for outcomes such as:

Highly available, fault‑tolerant platforms

  • All containerised services are deployed with appropriate replication, resilience, and resource limits
  • Workloads are designed for multi‑zone availability and safe failure modes
  • Deployment‑related incidents are eliminated or rapidly mitigated

Empowered engineering teams

  • Engineers can diagnose and resolve the majority of platform‑related issues independently
  • Clear standards, tooling, and automation reduce cognitive load and friction

Strong security posture

  • Vulnerabilities are proactively identified, prioritised, and remediated
  • Platform security tooling is continuously maintained and improved
  • All critical services are monitored with meaningful alerts and dashboards
  • Teams have access to self‑service monitoring and alerting capabilities

Key responsibilities

  • Design, build, and operate cloud infrastructure using Infrastructure as Code
  • Own and evolve Kubernetes platforms, including workload standards and deployment models
  • Develop and maintain CI/CD pipelines and GitOps workflows
  • Embed security best practices across infrastructure, pipelines, and runtime environments
  • Improve platform reliability, monitoring, and incident response workflows
  • Act as a technical leader and mentor for engineers using the platform
  • Partner with product and engineering teams to anticipate future platform needs
  • Own and shape a modern platform engineering capability
  • High trust, high autonomy engineering culture
  • Opportunity to influence platform strategy as the organisation scales

Qualifications

Essential skills and experience

  • Proven experience building and operating cloud‑native platforms at scale
  • Strong hands‑on experience with:
    • Kubernetes & containerised workloads
    • Infrastructure as Code (e.g. Terraform)
    • CI/CD pipelines and GitOps‑style delivery
  • Deep understanding of:
    • High availability, fault tolerance, and scaling strategies
    • Secure infrastructure design and operational security practices
  • Experience running production platforms on public cloud (GCP preferred; AWS acceptable)
  • Strong troubleshooting skills across distributed systems
  • Ability to explain complex technical concepts to non‑specialist audiences
  • Exposure to AI/ML or LLM‑based workloads in production environments

Technologies you’ll work with

  • Terraform
  • GitOps tooling (e.g. Argo CD)
  • Observability tooling (metrics, logging, alerting)
  • Modern AI‑enabled workloads and services

Nice to have

  • Experience with service mesh technologies (e.g. Istio)
  • Experience with Kubernetes Gateway API or modern ingress patterns
  • Familiarity with Redis, PostgreSQL, or managed cloud data services

Additional Information

We embrace flexibility and hybrid work opportunities to support diverse needs and lifestyles, while also valuing inclusive workplace experiences. By fostering a sense of community, we drive innovation, strengthen connections, and nurture belonging. Our commitment ensures you can work in a way that suits you best, while also engaging with colleagues to share ideas and build meaningful relationships.

#J-18808-Ljbffr
Location:
Greater London, England, United Kingdom
Salary:
£150,000 - £200,000
Job Type:
FullTime
Category:
IT & Technology

We found some similar jobs based on your search