Role: Site Reliability Engineer
Location: Germany - Remote
Contract : Permanent
EU Citizens Only
Job Description:
Key responsibilities:
Platform Engineering \& DevOps: Manage Kubernetes and container orchestration, including Helm chart configurations and CI/CD pipelines (Jenkins, ArgoCD). Develop automation scripts (Python, Bash, Go) and deploy Infrastructure-as-Code (IaC) solutions.
Observability, Monitoring \& Visualisation: Maintain Prometheus solutions (scrape configurations, alert rules, PromQL queries), administer Thanos and Grafana.
Elastic Stack Operations \& Log Management: Configure and optimise Elasticsearch clusters, Logstash pipelines, and Kibana dashboards for secure, scalable log processing.
Incident Response, Troubleshooting \& Collaboration: Participate in 24x7 on-call rotations for rapid incident response, troubleshoot platform, data and performance issues, and engage in Major Incident Management (MIM).
Secure Operations \& Compliance: Ensure system operations meet security and data protection requirements, maintain secure documentation, and manage access control policies.
Requirements:
Strong grasp of Linux concepts, preferably in Kubernetes environments.
Solid understanding of networking fundamentals and REST APIs.
Proficiency in Python, Go, or Bash.
Proficiency in Git-based configuration management workflows.
Familiarity with CI/CD tools like Helm, Jenkins, or ArgoCD.
Experience with Elasticsearch and/or OpenSearch.
Fluent English communication skills.
Willingness to work shift-based 24x7 on-call support, including weekends and holidays.
Must possess Ü2 security clearance or is ready to do it.
Citizenship required: Member state of EU and NATO. No dual citizenship outside these countries.
Must reside in Germany and hold a German labor contract or is ready to relocate to Germany.
Preferred Certifications: Elastic Certified Engineer, LPIC Level 2, Kubernetes Administrator.