AI Infrastructure Engineer / MLOps Engineer

New Today

AI Infrastructure Engineer / MLOps EngineerJoin Lenovo’s AI Technology Center (LATC) – a global AI Center of Excellence – to help shape AI at a truly global scale. We’re building the next wave of AI core technologies and platforms, and we need a highly skilled AI Infrastructure Engineer / AI Operations Engineer to design, build, and maintain the infrastructure and tools necessary for efficient AI model development, deployment, and operation.Responsibilities:AI Infrastructure Design and Implementation: Design, build, and maintain scalable and efficient AI infrastructure, including compute resources, storage solutions, and networking configurations.AI Model Deployment and Management: Develop and implement processes for deploying, monitoring, and managing AI models in production environments.Automation and Tooling: Create and maintain automation scripts and tools for AI model training, testing, evaluation, and deployment in a continuous integration / continuous delivery (CI/CD) pipeline.Collaboration and Support: Work closely with data scientists, engineers, and other stakeholders to ensure smooth operation of AI systems and provide support as needed.Performance Optimization: Continuously monitor and optimize AI infrastructure and models for performance, scalability, utilization, and reliability.Security and Compliance: Ensure AI infrastructure and models comply with relevant security and regulatory requirements.Qualifications:Bachelor’s or Master’s degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field.8+ years of experience in software engineering, DevOps, or a related field.Strong background in computer systems, distributed systems, and cloud computing.Proficient in Linux system administration, including package management, user/group management, file system navigation, shell scripting (bash), and system configuration (systemd, networking).Proficiency in programming languages such as Python, Java, or C++.Experience with AI-specific infrastructure and tools (e.g., NVIDIA GPUs and CUDA).Experience with setting up multi-node distributed GPU clusters, leveraging Slurm, Kubernetes or related software stacks.Experience with managing high-performance computing (HPC) clusters, including job scheduling, resource allocation, and cluster maintenance.Familiarity configuring job scheduling tools (e.g., Slurm).Experience with AI infrastructure, model deployment, and management.Excellent problem‑solving and analytical skills.Strong communication and collaboration skills.Ability to work in a fast‑paced, dynamic environment.Bonus Points:Familiarity with AI and machine learning frameworks (PyTorch).Familiarity with cloud platforms (AWS, GCP, Azure).Experience with containerization (Docker) and orchestration (Kubernetes).Experience with monitoring and logging tools (Prometheus, Grafana).What we offer:Opportunities for career advancement and personal development.Access to a diverse range of training programs.Performance‑based rewards that celebrate your achievements.Flexibility with a hybrid work model (3:2) that blends home and office life.Electric car salary sacrifice scheme.Life insurance.Location: Edinburgh, Scotland – candidates must be based there, as the role requires working from the office at least three days per week (3:2 hybrid policy).Seniority level: Mid‑Senior levelEmployment type: Full‑timeJob function: Information TechnologyIndustry: IT Services and IT Consulting #J-18808-Ljbffr
Location:
City Of Edinburgh
Job Type:
FullTime

We found some similar jobs based on your search