Our client operates large-scale GPU cloud platforms across Asia-Pacific. As part of their expansion, they are looking for experienced platform engineers to build and scale their next-generation data center operations. This role offers direct impact in a well-funded technology company working at the forefront of sustainable AI infrastructure.
Role
You'll drive the technical foundation for MLOps capabilities and platform infrastructure supporting cutting-edge NVIDIA GPU clusters. This position demands expertise in designing and operating Kubernetes environments for high-performance computing, implementing Infrastructure-as-Code frameworks, and building world-class observability platforms. You'll collaborate directly with founders and engineering leadership to establish DevOps standards, enhance CI/CD pipelines, and integrate enterprise-grade monitoring across distributed systems. The role requires ownership of incident response, active participation in on-call rotation, and leading root cause analysis to elevate operational maturity. You'll work with technologies including Terraform, Ansible, Prometheus, Grafana, Loki, and OpenTelemetry while managing infrastructure supporting thousands of servers across multiple data centers.
Requirements
We seek candidates with 7+ years of platform engineering, SRE, or DevOps experience who have built observability and infrastructure platforms from first principles. Deep proficiency with containerization, Kubernetes cluster management, Infrastructure-as-Code tools, and the LGTM observability stack (Loki, Grafana, Tempo, Prometheus/Thanos) is essential. You must demonstrate hands-on expertise with Linux internals, networking stacks, distributed storage, and scripting languages such as Python, Go, or Bash. Experience with telemetry solutions (Redfish, gNMI, SNMP, eBPF) and compliance frameworks (SOC 2, ISO 27001) is highly valued. Bachelor's degree in Computer Science or related field required.
To Apply
To apply, please submit your resume to Yien Quek at yq@kerryconsulting.com. We regret to inform that only successful shortlisted candidates will be notified. Licence No: 16S8060 \| Registration no: R1109830