Job Title:
Site Reliability Engineer (SRE)
Location:
Singapore
Experience:
8+ years (including 3+ years in Java)
About the Role
We’re looking for a skilled
Site Reliability Engineer
with strong Java and cloud-native development experience to design, build, and maintain reliable, scalable systems on Kubernetes and AWS. You’ll work closely with development and platform teams to drive automation, observability, and operational excellence.
Key Responsibilities
Develop and deploy Java/Spring Boot microservices on Kubernetes (EKS, AKS, OCP).
Build observability and monitoring using ELK, Prometheus, Grafana, CloudWatch, Jaeger, and OpenTelemetry.
Improve reliability, scalability, and performance across distributed systems.
Support production systems and participate in on-call rotations.
Automate infrastructure and CI/CD pipelines (Terraform, Ansible, etc.).
Required Skills
8+ years in software development, 3+ in Java (Spring Boot).
Experience with Kubernetes, Linux, networking, and distributed systems.
Hands-on with observability and monitoring tools.
Desirable Skills
Cloud-native development on AWS.
Experience with ArgoCD, Go/Python/Groovy, and Java performance tuning.
Exposure to Terraform, Crossplane, API gateways (Apigee, Kong, Istio), and configuration tools (Ansible, Puppet).
Familiarity with APM tools (Dynatrace, AppDynamics) and databases (PostgreSQL, MongoDB).
Relevant certifications (AWS, Kubernetes, Java, Linux) preferred.