Senior Site Reliability Engineer (SRE)Full-Time \| 100% Remote (USA)

Dive into a

high-impact SRE

role where you'll safeguard and supercharge a cutting-edge data platform. We're talking rock-solid reliability for apps that deliver real-time business wins—join us to architect the invisible magic that keeps everything humming! What You'll Rock

Own Reliability End-to-End: Jump into on-call rotations, nail incident response, and lead postmortems to make systems bulletproof.

Build Epic Infra: Design, deploy, and scale cloud setups with

Terraform and IaC across AWS (must), GCP, and Azure.

Master Kubernetes:

Run clusters on EKS with Bottlerocket OS and Cilium/eBPF

for next-level networking and security.

Streamline Deploys: Roll out apps via

Helm and FluxCD,

while plotting upgrades to fully autonomous operators.

Amp Up Observability: Set up monitoring stacks with

OpenTelemetry, Prometheus, and Grafana's LGTM stack

(Loki for logs, Tempo for traces, Mimir for metrics) to spot issues before they bite.

Team Up for Wins: Partner with product and eng crews to bake reliability into every feature from day one.

Must-Have Superpowers

Battle-tested in high-stakes prod environments: On-call heroics, swift incident handling under tight SLAs, and crystal-clear comms for escalations.

Hands-on AWS wizardry with

Terraform; bonus for GCP or Azure

Deep Kubernetes know-how:

Cluster ops, Helm charts, community operators (like CNPG), and GitOps tools like Flux.

Linux and networking ninja: TCP/IP mastery, plus security, compliance, and hot tech like eBPF/Cilium.

Comfort with OpenTelemetry and Prometheus for observability awesomeness.

If you're a reliability rockstar ready to tame chaos and build unbreakable systems, apply now—let's make downtime a myth!

Site Reliability Engineer

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox