Technical Architect

2 Days Old

As the Infrastructure Architect (Data Center & Network), you will design, blueprint, and guide the end‑to‑end implementation—rack to control plane—of GPU clusters, storage tiers (e.g., PowerScale), high‑performance fabrics, and the automation layer (PowerShell/Ansible/Terraform) that makes the environment secure, resilient, and self‑healing. Key Responsibilities :
Own the reference architecture for on‑prem AI compute (GPU servers, accelerators, DPUs), storage (PowerScale, NVMe, object), network (nvidia, RoCEv2/InfiniBand, spine‑leaf, 400/800G), and control planes (Kubernetes/OpenShift). Define hardware BOMs, rack elevations, power/cooling envelopes, structured cabling, and airflow/thermal design aligned to DC constraints (hot aisle/cold aisle, liquid cooling readiness). Design multi‑tenant isolation (air‑gapped zones, RBAC, network segmentation, QoS) and GPU partitioning (MIG, MPS) for mixed training/inference workloads. Establish capacity models for GPU/CPU, memory, storage throughput (GB/s), and east‑west/egress bandwidth—tied to model sizes, training steps, and data ingest profiles. Create Day‑0/1/2 patterns: golden images, firmware baseline, control plane HA, cluster expansion and evacuation procedures.
#J-18808-Ljbffr
Location:
Greater London
Job Type:
FullTime

We found some similar jobs based on your search