Job Title:

Site Reliability Engineer (SRE)

Location:

Singapore

Experience:

8+ years (including 3+ years in Java)

About the Role

We’re looking for a skilled

Site Reliability Engineer

with strong Java and cloud-native development experience to design, build, and maintain reliable, scalable systems on Kubernetes and AWS. You’ll work closely with development and platform teams to drive automation, observability, and operational excellence.

Key Responsibilities

Develop and deploy Java/Spring Boot microservices on Kubernetes (EKS, AKS, OCP).

Build observability and monitoring using ELK, Prometheus, Grafana, CloudWatch, Jaeger, and OpenTelemetry.

Improve reliability, scalability, and performance across distributed systems.

Support production systems and participate in on-call rotations.

Automate infrastructure and CI/CD pipelines (Terraform, Ansible, etc.).

Required Skills

8+ years in software development, 3+ in Java (Spring Boot).

Experience with Kubernetes, Linux, networking, and distributed systems.

Hands-on with observability and monitoring tools.

Desirable Skills

Cloud-native development on AWS.

Experience with ArgoCD, Go/Python/Groovy, and Java performance tuning.

Exposure to Terraform, Crossplane, API gateways (Apigee, Kong, Istio), and configuration tools (Ansible, Puppet).

Familiarity with APM tools (Dynatrace, AppDynamics) and databases (PostgreSQL, MongoDB).

Relevant certifications (AWS, Kubernetes, Java, Linux) preferred.

Site Reliability Engineer (SRE)

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox