About the Role
AI Infrastructure Engineer- L3
The Role
The AI Infrastructure Engineer (L3) provides advanced engineering and architectural expertise for high‑performance AI and ML infrastructure. This role focuses on building, optimizing, and scaling GPU/accelerator environments and distributed systems for large‑scale training and inference workloads.
Competency Focus: High‑performance computing (HPC), distributed systems, Kubernetes, GPU orchestration, cloud optimization
Keywords: Nvidia GPU Infrastructure, Kubernetes, GPU Cluster Administrator, Infrastructure SME, RCA
Responsibilities:
- Deploy, configure, and manage GPU and AI accelerator platforms (NVIDIA A100/H100/L40, AMD Instinct, TPU).
- Troubleshot GPU hardware and software issues, including failures, thermal throttling, PCIe/NVLink topology, and driver conflicts.
- Install, upgrade, and maintain GPU software...
Ready to Apply?
Submit your application today and take the next step in your career journey with HCLTech.
Apply Now