Site Reliability Engineer
New Yesterday
An exciting opportunity for a Site Reliability Engineer to join an award-winning, Cambridge-based AI software company at the forefront of machine learning innovation. As a Site Reliability Engineer, you will play a key role in maintaining and enhancing cloud infrastructure, monitoring systems, and deployment processes, ensuring the reliability, scalability, and security of a sophisticated machine learning platform deployed across cloud environments. Location:
Cambridge, hybrid working model
3 days in office
not easily reachable via public transport from London
Salary:
Negotiable
Requirements for Site Reliability Engineer: Minimum 2:1 degree in Computer Science or a related field
2+ years experience in a DevOps, SRE, Platform Engineering or similar role
Experience configuring and using monitoring tools such as Grafana and Prometheus
Hands-on experience with cloud infrastructure, ideally GCP (Azure or AWS also considered)
Experience with Infrastructure-as-Code tools such as Terraform
Experience working with Docker, Kubernetes, and Helm
Strong understanding of cloud security and reliability best practices
Scripting experience using Python and/or Bash
Experience using Git within a professional software development environment
Strong problem-solving and analytical skills with a proactive mindset
Desirable: Experience responding to and investigating security or reliability incidents in distributed cloud environments
Ability to communicate technical challenges to non-technical stakeholders
Familiarity with technologies such as NGINX, Flask (Python), React (TypeScript), PostgreSQL,
OpenSearch, Valkey, or Keycloak
Experience administering Linux-based systems
Experience with CI tools such as CircleCI
Exposure to information security compliance standards (e.g. ISO 27001)
Experience working within Agile development environments
Responsibilities for Site Reliability Engineer: Develop and enhance monitoring systems to proactively identify performance, reliability, security, and cost issues
Monitor platform performance and communicate insights to engineering teams
Support incident response and assist with remediation of platform vulnerabilities
Identify, plan, and implement improvements to cloud infrastructure and deployment processes
Work closely with engineering teams to support product development and platform scalability
Ensure infrastructure and deployments are secure, robust, and aligned with best practices
Advocate for effective monitoring and reliability considerations throughout the development lifecycle
Support ongoing compliance with information security standards including ISO 27001
What this offers: Working for an award-winning AI software company at the forefront of machine learning innovation
A hands-on SRE role with exposure to modern cloud-native technologies and infrastructure
The opportunity to work on complex, real-world problems within industrial R&D environments
A collaborative, high-calibre engineering team within a growing Cambridge-based business
A competitive salary and benefits package
Applications: If you are a Site Reliability Engineer looking to develop your career within a cutting-edge AI company, we would love to hear from you. Please send an up-to-date CV via the relevant link. Were committed to creating an inclusive and accessible recruitment process. If you require reasonable adjustments for your application or during the review process, please highlight this by emailing
(if this email address has been removed by the job board, full details for contact are available on our website).
Keywords:
Site Reliability Engineer / SRE / DevOps Engineer / Platform Engineer / Cloud Engineer / Kubernetes / Docker / Terraform / GCP / AWS / Azure / Grafana / Prometheus / CI/CD / Python / Bash / Infrastructure Engineer
RedTech Recruitment Ltd focuses on finding roles for Engineers and Scientists. Even if the above role isnt of interest, please visit our website to see our other opportunities. We are an equal opportunity employer and value diversity at RedTech. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status
TPBN1_UKTJ
- Location:
- United Kingdom
- Job Type:
- FullTime
- Category:
- IT;IT