Site Reliability Engineer, GPUs in AI

New Today

Job Description

We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc.


They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.


Requirements:

6 years experience in a high performance field like AI, big tech, or quantitative trading

Experience of working on clusters of 1000 GPUs or higher

Experience of driving key projects in your team or business

Location:
City Of London
Job Type:
FullTime
Category:
Technology

We found some similar jobs based on your search