Site Reliability Engineer (SRE)

New Today

About Us We are a leading gaming and gambling solution software provider with a strong presence in the USA, UK, and Europe. Through partnerships with global gaming companies, we build cutting-edge technical platforms across sportsbooks, lottery, casino, virtual gaming, and financial trading. Our vision is to shape the future of gaming by transforming operations into intelligent, data-driven solutions that deliver exceptional customer experiences and create sustainable value for all stakeholders. We believe in teamwork, knowledge sharing, and transparency with accountability. The Role Were looking for a Site Reliability Engineer (SRE) to help shape and drive how we build and operate reliable, observable, and cost-efficient systems. Youll work closely with development, platform, and incident management teams to define what reliable means in measurable terms
and build the tooling and processes to achieve it. Your work will directly influence the speed, stability, and scalability of our platform. Key Responsibilities Partner with development teams to define and manage SLOs/SLIs, and use error budgets to guide engineering decisions. Enhance observability
ensuring metrics, logs, and tracing are in place to detect and fix issues proactively. Lead cost optimisation initiatives: monitor spend, rightsize workloads, tune autoscaling, and drive efficient infrastructure usage. Strengthen production readiness with pre-deployment checks, post-release validation, and robust platform guardrails. Introduce and run chaos engineering experiments to improve system resilience. Automate operational processes to reduce manual intervention across the stack. Contribute to major incident response, providing engineering expertise. Collaborate cross-functionally to raise the bar on platform stability, security, and performance. Required Skills & Experience 3+ years in SRE, Platform, or DevOps roles. Strong operational experience with Kubernetes (on-prem and AWS EKS). Proven track record defining and working with SLOs/SLIs in production environments. Deep understanding of observability (metrics, logging, tracing, telemetry TPBN1_UKTJ
Location:
United Kingdom
Job Type:
FullTime
Category:
IT