Kubernetes Orchestration Engineer – GPU Hypercomputing & AI Workloads
10+ years
Introduction

We are seeking a highly skilled Kubernetes Orchestration Engineer to lead the deployment and management of GPU-optimized Kubernetes environments that power AI/ML and hypercomputing workloads. This role is critical to ensuring scalable, reliable, and high-performance infrastructure across on-premises and hybrid cloud environments. As a core member of our infrastructure engineering team, you will work at the intersection of container orchestration, GPU resource management, and AI application scaling, enabling large scale distributed training and inference across GPU clusters

Job Description

Must Have

  • Strong experience with Kubernetes (K8s) and container orchestration in production environments.
  • Expertise in managing GPU workloads in Kubernetes using NVIDIA GPU Operator, vGPU, and device plugin configurations.
  • Proficiency with container runtimes such as Docker and CRI-O, and orchestration tools like Helm and Kubernetes Operators.
  • Solid understanding of networking within Kubernetes and service mesh integration (e.g., Istio, Linkerd).
  • Familiarity with hybrid/multi-cloud Kubernetes platforms (e.g., GKE, EKS, AKS).
  • Strong scripting and automation skills (e.g., YAML, Helm templating, Bash, Python). 
Responsibilities include:
  • AI Infrastructure Design & Deployment with multi-GPU clusters using NVIDIA or AMD platforms.
  • Configure GPU environments using CUDA, DGX Systems, and NVIDIA Kubernetes Device Plugin.
  • Deploy and manage containerized environments with Docker, Kubernetes, and Slurm.
  • AI Model Support & Optimization for training, fine-tuning, and inference pipelines for LLMs and deep learning models.
  • Enable distributed training using DDP, FSDP, and ZeRO, with support for mixed precision.
  • Tune infrastructure to optimize model performance, throughput, and GPU utilization.
  • Design and operate high-bandwidth, low-latency networks using InfiniBand and RoCE v2.
  • Integrate GPUDirect Storage and optimize data flow across Lustre, BeeGFS, and Ceph/S3.
  • Support fast data ingestion, ETL pipelines, and large-scale data staging.
  • Leverage NVIDIA’s AI stack including cuDNN, NCCL, TensorRT, and Triton Inference Server.
  • Conduct performance benchmarking with MLPerf and custom test suites
Certifications :
  • Certified Kubernetes Administrator (CKA) –Must
  • Certified Kubernetes Application Developer (CKAD) 
  • NVIDIA Certified Kubernetes Specialist

 

Educational Qualifications

  • Batchlors in Computer Science/Applications/BTech Computer
  • Science/MCA
Primary Skills :
  • Kubernetes Cluster Management for AI/ML Workloads
  • NVIDIA GPU Operator & Device Plugin Configuration in K8s
  • Container Orchestration using Docker, CRI-O, and Helm
  • Kubernetes Operators for Lifecycle Automation & Scaling
  • Pod Networking with CNI Plugins – Calico, Flannel, Cilium
  • Monitoring & Observability with Prometheus, Grafana, Kibana
  • GPU Workload Scheduling & Optimization in Kubernetes
  • Deployment of Distributed AI Frameworks (PyTorch, TensorFlow, Hugging Face)
  • Service Mesh Integration – Istio or Linkerd
  • Hybrid/Multi-Cloud Kubernetes Deployments (EKS, GKE, AKS)
Secondary Skills :
  • Helm Templating & YAML Scripting for Deployment Automation
  • Infrastructure Scripting using Bash, Python, or Ansible
  • Kubernetes Custom Resource Definitions (CRDs) & API Extensions
  • GPU Virtualization (vGPU) and Multi-Tenant GPU Allocation
  • Kubeflow or MLflow Integration for MLOps Pipelines
  • K8s Security (RBAC, Network Policies, Pod Security Standards)
  • CI/CD Integration with GitOps Tools (ArgoCD, Flux)
  • GPU Monitoring via NVIDIA DCGM or NVIDIA Cloud Native Stack
  • Advanced Troubleshooting in Kubernetes (control plane, etcd, kubelet)
  • Cloud-Native Storage for AI – CSI Drivers, NFS, Ceph
Job Details
Role:
Kubernetes Orchestration Engineer – GPU Hypercomputing & AI Workloads
Location :
Dubai
Close Date :
18-07-2025
Interested candidates may forward their detailed resumes to Careers@reflectionsinfos.com along with their notice period, current and expected CTC details. This is to notify jobseekers that some fraudsters are promising jobs with Reflections Info Systems for a fee. Please note that no payment is ever sought for jobs in Reflections. We contact our candidates only through our official website or LinkedIn and all employment related mails are sent through the official HR email id. Please contact careers@reflectionsinfos.com for any clarification/ alerts on this subject.
viewall
Apply Now
share
Recent Jobs
Close Date : 18-07-2025
Experience : 10+years
Location : Dubai
viewall
View Details
Close Date : 18-07-2025
Experience : 10+ years
Location : Dubai
viewall
View Details