Backend Software Engineer

New Today

Applications deadline: We accept submissions until 15 January 2026. We review applications on a rolling basis and encourage early submissions. About The Opportunity We’re looking for Backend Software Engineers who are excited to build tools for frontier AGI safety research, e.g., building and maintaining evals libraries and tools for monitoring and controlling our own LLM traffic. Representative Projects Here is a list of example projects you might build and ship in your first 6 months.
Internal tooling for efficiently running and analyzing evaluations. Automated evaluation pipelines to minimize the time from getting access to a new model for pre‑deployment testing to analyzing the most important results. Orchestration tools that allow researchers to run thousands of agentic evaluations in parallel on remote machines with high security and reliability. LLM proxy service that enables real‑time monitoring of coding agent traffic and automatic detection of undesired behavior. LLM agents and MCP tools to automate internal software engineering and research tasks with sandboxes to prevent major failures. CI pipeline optimisations to reduce execution time and eliminate flaky tests. Telemetry API and instrumentation of existing tools to monitor usage and improve reliability. Data warehousing pipeline and service to store thousands of eval transcripts for research use. Upstream improvements to the Inspect framework and ecosystem.
Key Responsibilities
Rapidly prototype and iterate on internal tools and libraries for building and running frontier language model evaluations. Lead the development of major features from ideation to implementation. Collaboratively define and shape the software roadmap and priorities. Establish and advocate for good software design practices, codebase health, and coding agent practices. Work closely with researchers to understand their challenges. Assist researchers with implementation and debugging of research code. Communicate clearly about technical decisions and trade‑offs.
Required Qualifications
Experience writing production‑quality Python code. 5+ years of professional software engineering experience. Strong candidate examples: leading successful software tools/products for extended periods; building tech stacks for startups; progressive roles in large organisations; authorship of popular open‑source tools or libraries; prestigious programming competition placements.
Bonus Skills
Experience working with LLM agents or LLM evaluations. Infosecurity / cybersecurity experience. Experience with AWS. Interest in AI safety.
We strongly encourage applications from candidates who may not meet every listed requirement but believe they are a good fit. Logistics
Start Date: Target 2–3 months after the first interview. Time Allocation: Full‑time. Location: London office, adjacent to the London Initiative for Safe AI (LISA). In‑person role, with rare partial remote options. Work Visas: Sponsored UK visas available.
Benefits
Salary: £100k–200k GBP (~$135k–$270k USD). Flexible work hours and schedule. Unlimited vacation. Unlimited sick leave. Lunch, dinner, and snacks on workdays. Paid work trips, including staff retreats and relevant conferences. A yearly $1,000 professional development budget.
About Apollo Research Apollo Research focuses on the risks of autonomous AI systems, particularly Loss of Control and deceptive alignment. We conduct research on detecting, understanding, and mitigating scheming behaviors, collaborating with leading frontier AI companies. About The Team The SWE team currently includes Rusheb Shah, Andrei Matveiakin, Alex Kedrik, and Glen Rodgers. You will closely interact with research scientists and engineers. Equality Statement Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation. Interview Process The process includes a screening interview, a take‑home test (~2 hours), three technical interviews, and a final interview with CEO Marius. Technical interviews are job‑related and do not include generic coding challenges. Preparation suggestions include building LLM evaluation projects using Inspect. #J-18808-Ljbffr
Location:
Greater London
Job Type:
FullTime

We found some similar jobs based on your search