Head of SRE and Production Engineering (London)

New Today

1 month ago Be among the first 25 applicants SS&C is a global provider of investment and financial software-enabled services and software for the global financial services and healthcare industries. The GIDS product suite powers mission-critical investor and distributor services across asset managers, insurance companies, retirement providers, and wealth management platforms.
As the Head of Production Engineering and Site Reliability Engineering (SRE) for the GIDS organisation, you will lead a team responsible for the scalability, resilience, performance, and reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. Define and execute the vision and roadmap for Production Engineering and SRE within GIDS.
Build and lead globally distributed, high-performance teams with a focus on talent development, SRE culture, and operational excellence.
Collaborate cross-functionally with Engineering, Product, Compliance, and Infrastructure teams to improve system reliability and efficiency.
Production Operations & Incident Management
Own reliability, uptime, and performance KPIs for GIDS applications and services.
CI/CD and Platform Engineering
Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC.
Lead the implementation and usage of modern observability stacks (e.g., Establish SLOs, SLIs, and error budgets with product and engineering teams.
Drive root cause identification using distributed tracing, advanced log analysis, and anomaly detection.
Partner with security and compliance teams to embed controls into infrastructure and software delivery.
Automate audit evidence collection, change tracking, and access management (e.g., HashiCorp Vault, OPA, AWS IAM).
Ensure all systems meet internal and regulatory audit requirements (SOC2, GDPR, etc.).
Champion infrastructure-as-code (IaC) using Terraform, Helm, and Kubernetes for scalable cloud and hybrid deployments.
Optimise infrastructure cost, elasticity, and resilience through autoscaling, canary deployments, and chaos testing.
Maintain high SLAs for critical services running on Kubernetes, AWS, and on-prem hybrid infrastructure.
Talent Management & Culture
Attract, retain, and mentor top engineering talent with a strong focus on diversity and continuous learning.
Drive career development through structured learning paths, performance reviews, and skills-based mentoring.
Talent Management & Global Operations
Establish and enforce engineering and operational standards for deployments, monitoring, and incident response across geographies.
Drive hiring, onboarding, and training initiatives that support both site reliability and continuous delivery.
Foster a strong engineering culture rooted in transparency, autonomy, learning, and operational excellence.
Develop strategies to prevent burnout in around-the-clock operations, including tooling, automation, and shift rotation planning.
10+ years of experience in engineering, with 5+ years in a leadership role in SRE, DevOps, or Production Engineering.
~ Proven track record managing reliable, scalable systems in a high-compliance environment (e.g., Strong understanding of modern software development lifecycle, CI/CD, IaC, and cloud-native technologies.
~ Expertise in Kubernetes, AWS (or Azure/GCP), GitOps workflows, observability tools, and automation frameworks.
~ We encourage applications from people of all backgrounds and particularly welcome applications from under-represented groups, to enable us to bring a diversity of perspectives to our thinking and conversation. SS&C Technologies is an Equal Employment Opportunity employer and does not discriminate against any applicant for employment or employee on the basis of race, color, religious creed, gender, age, marital status, sexual orientation, national origin, disability, veteran status or any other classification protected by applicable discrimination laws.
Employment type Full-time

Job function Engineering and Management
Industries Financial Services, Software Development, and Investment Management
Sign in to set job alerts for “Site Reliability Engineer” roles.
Systems Engineer - Systematic Hedge Fund - £200k
London, England, United Kingdom 16 hours ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#
Location:
London
Job Type:
FullTime

We found some similar jobs based on your search