Site Reliability Engineer (SRE) \| Spain - Remote \| AWS, Kubernetes, Observability (Prometheus \& Grafana), OpenTelemetry, Terraform, Python / Go, CI/CD \| €70-94K
Overall
A high-growth global payments platform is looking for a Site Reliability Engineer to help design, build, and maintain their centralised observability capabilities. You’ll work across mission-critical, large-scale systems used by major global brands, ensuring reliability, performance, and effective telemetry across the organisation.
Key Responsibilities
Design, implement, and maintain
observability pipelines
(logs, metrics, traces) using
OpenTelemetry
.
Build
self-service tooling
and automation enabling engineering teams to instrument and monitor their services.
Contribute to
incident management
, owning processes, runbooks, and response automation.
Partner with product and engineering teams to define
monitoring, alerting, SLO/SLA
requirements.
Use
IaC
to provision and manage observability infrastructure and alerting configurations.
Establish baseline observability standards for new and existing services.
Continuously improve alert quality, signal-to-noise ratio, and operational reliability.
Core Tech \& Skills
Strong Cloud experience with
AWS
Kubernetes
(deployments, operations, monitoring internals)
OpenTelemetry
(collectors, pipelines, instrumentation)
Observability tooling:
Grafana, Prometheus, Loki, Datadog, New Relic
Terraform
+ GitOps CI/CD (
ArgoCD, GitHub Actions
, similar)
Incident tooling:
PagerDuty, Jira
Scripting:
Python, Go
(or similar)
Strong experience in SRE, DevOps, or observability-focused engineering roles (4+ years)
Site Reliability Engineer (SRE) \| Spain - Remote \| AWS, Kubernetes, Observability (Prometheus \& Grafana), OpenTelemetry, Terraform, Python / Go, CI/CD \| €70-94K