Infrastructure as Code (IaC) Engineer – AI Data Center Automation
Introduction

We are seeking a highly capable Infrastructure as Code (IaC) Engineer to lead the design, implementation, and management of automated infrastructure provisioning for high performance AI data centers. This role is central to orchestrating compute, network, storage, and virtualization layers using modern IaC tools across on-premise and hybrid cloud environments.
The ideal candidate will play a strategic role in enabling scalable and repeatable deployment pipelines that support GPU clusters, AI model training environments, and containerized platforms such as Kubernetes and OpenShift. 

Job Description

Must have

  • 5+ years of experience in infrastructure automation or SRE roles with hands-on IaC deployment.
  • Proficiency in Terraform, Ansible, and scripting languages such as Python, Bash, and YAML.
  • Experience automating infrastructure in GPU-intensive environments supporting AI/ML workloads.
  • Strong understanding of networking (VXLAN, EVPN, BGP, RoCE) and virtualization platforms (OpenShift, VMware, KVM).
  • Familiarity with Kubernetes, Helm, Operators, and container orchestration frameworks.
  • Exposure to storage automation for AI data lakes (e.g., Ceph, BeeGFS, Lustre, or S3-compatible storage).
  • Experience with CI/CD tools (GitLab CI/CD, Jenkins, ArgoCD, Flux) in IaC pipelines
Responsibilities include:
  • Design and implement IaC frameworks to automate the provisioning and configuration of data center infrastructure for AI workloads.
  • Orchestrate and manage multi-layer automation across compute (GPU/CPU), networking (VXLAN, EVPN, BGP), storage (NVMe, object, parallel file systems), and virtualization platforms (KVM, VMware, OpenShift).
  • Develop reusable Terraform modules, Ansible playbooks, and YAML templates to define infrastructure in version-controlled environments.
  • Automate deployment of Kubernetes clusters and integrate with GPU operators for training and inference pipelines.
  • Build and maintain CI/CD pipelines to deploy, test, and manage infrastructure changes using tools like GitLab CI/CD, Jenkins, or ArgoCD.
  • Integrate with monitoring and observability stacks (Prometheus, Grafana, DCGM) for automated infrastructure validation and health monitoring.
  • Work closely with AI/ML platform teams to align infrastructure deployment with model training, data pipelines, and security policies.
  • Ensure compliance with security and operational standards through policy-as-code and drift detection mechanisms.
Certifications :
  • Certified Kubernetes Administrator (CKA) –Must
  • Certified Kubernetes Application Developer (CKAD) 
  • NVIDIA Certified Kubernetes Specialist

 

Educational Qualifications

  • Batchlors in Computer Science/Applications/BTech Computer Science/MCA
Primary Skills :
  • Terraform Scripting & Module Development
  • Ansible Automation for OS, Network, and Storage Provisioning
  • Infrastructure Automation for GPU-Centric AI Workloads
  • IaC for Virtualization Platforms – OpenShift, KVM, VMware
  • Kubernetes Deployment Automation & GPU Operator Integration
  • Scripting Languages – Python, Bash, and YAML
  • CI/CD for Infrastructure – GitLab CI/CD, Jenkins, ArgoCD
  • Network Configuration Automation – VXLAN, EVPN, BGP, RoC
  • Storage Provisioning for AI – Ceph, Lustre, BeeGFS, or S3-compatible
  • Monitoring Stack Integration – Prometheus, Grafana, DCGM
Secondary Skills :
  • Policy-as-Code & Security Automation (OPA/Gatekeeper, Sentinel)
  • Drift Detection & Compliance Enforcement in IaC Environments
  • Helm Charts & Kubernetes Operators Automation
  • Version Control Practices with Git for Infra Code
  • Integration with AI/ML Pipelines and Data Ingestion Frameworks
  • Hybrid/Multicloud Orchestration Knowledge (AWS, Azure, GCP)
  • OpenShift GitOps or Flux for Continuous Delivery of Infra
  • Template Reusability and Modular Code Design in IaC
  • GPU Resource Pooling and Node Provisioning with Ansible + Terraform
  • Observability Automation (DCGM Exporter, Node Exporter, Custom Metrics)
Job Details
Role:
Infrastructure as Code (IaC) Engineer – AI Data Center Automation
Location :
Dubai
Close Date :
18-07-2025
Interested candidates may forward their detailed resumes to Careers@reflectionsinfos.com along with their notice period, current and expected CTC details. This is to notify jobseekers that some fraudsters are promising jobs with Reflections Info Systems for a fee. Please note that no payment is ever sought for jobs in Reflections. We contact our candidates only through our official website or LinkedIn and all employment related mails are sent through the official HR email id. Please contact careers@reflectionsinfos.com for any clarification/ alerts on this subject.
viewall
Apply Now
share
Recent Jobs
Close Date : 18-07-2025
Experience : 10+ years
Location : Dubai
viewall
View Details
Close Date : 18-07-2025
Experience : 10+years
Location : Dubai
viewall
View Details
Close Date : 18-07-2025
Experience : 10+ years
Location : Dubai
viewall
View Details