👨🏻‍💻 postech.work

Senior Site Reliability Engineer

Ericsson • 🌐 In Person • 💵 $57,000 - $72,000

In Person Posted 4 days, 23 hours ago

Job Description

Position Summary

The Site Reliability Engineer (SRE) plays a pivotal role in ensuring the reliability, stability, and performance of Ericsson’s critical software platforms. This position combines software engineering expertise with a focus on operational efficiency to minimize downtime and drive reliability across large-scale developer platforms. The ideal candidate will leverage their experience to enhance monitoring systems, streamline incident management processes, and improve resilience, enabling Ericsson’s global engineering teams to innovate without disruption. A proven ability to implement enterprise-level reliability solutions and best practices is essential for this high-impact role.

Manage the reliability and scalability of Ericsson’s global multi-node GitLab infrastructure, supporting thousands of repositories. Responsibilities include implementing automated failover and redundancy, deploying robust 24/7 monitoring with industry-standard tools, and ensuring seamless CI/CD workflows through proactive performance optimization and rapid incident response.

Drive operational excellence for developer platforms such as GitLab, Backstage, and other enterprise-scale systems by advancing platform health and facilitating uninterrupted software delivery.

Design and implement tools, processes, and frameworks for effective incident management, performance optimization, and fault tolerance, ensuring consistent reliability across platforms.

Lead automation and monitoring initiatives to reinforce the resilience of Ericsson’s platforms in diverse operational settings, supporting continuous improvement and scalability.

Core Responsibilities

1. Enhance Platform Reliability and Stability

a. Identify and mitigate reliability risks across Ericsson’s developer platforms, with a focus on proactively improving platform resilience.

b. Architect solutions for fault tolerance and failure recovery within hybrid, cloud-native, or distributed infrastructures.

c. Monitor and optimize system performance metrics such as error rates, latency, and uptime to ensure seamless operations.

2. Develop Comprehensive Monitoring and Observability Frameworks

a. Implement real-time monitoring systems (e.g., Prometheus, Grafana, ELK, OpenTelemetry) to track system performance and uncover reliability patterns.

b. Create dashboards and telemetry tools to provide deep insights into platform health, facilitating proactive issue resolution and scaling decisions.

3. Incident Response and Post-Mortem Management

a. Establish scalable incident management protocols, including real-time responses, root cause analyses (RCA), and structured post-mortems.

b. Develop automation pipelines for detection, diagnosis, and recovery from system disruptions or outages.

c. Minimize mean time to resolution (MTTR) for key platforms while maintaining a high standard for post-incident reporting and improvement planning.

4. Optimize Performance at Scale

a. Implement capacity planning and system hardening strategies that support increasing developer activity and workload demand.

b. Design resilient architectures capable of scaling while maintaining adherence to Ericsson’s stringent performance benchmarks.

5. Promote Collaboration to Achieve Operational Excellence

a. Partner with platform engineers to integrate reliability principles into IDP workflows, ensuring CI/CD pipelines are fault-tolerant and scalable.

b. Mentor engineering teams and contribute to shaping a culture of continuous improvement in reliability practices.

c. Collaborate across functional teams (Engineering, DevOps, Infrastructure) to align platform reliability efforts with Ericsson’s strategic goals.

Preferred Qualifications

Direct Experience:

o 5–8 years in SRE or platform engineering roles, demonstrating success in scaling reliability solutions across large, global platforms.

o Proven contributions to reducing downtime, improving platform performance metrics, and establishing operational governance practices.

o Documented achievements in leading reliability-focused initiatives within complex organizational environments.

Technical Expertise:

o Advanced expertise in tooling for monitoring and observability, including Prometheus, Grafana, Splunk, ELK Stack, or OpenTelemetry.

o Proficiency in automation and infrastructure tools such as Kubernetes, Terraform, or Ansible.

o Strong programming skills (e.g., Python, Go, or Bash) for developing system optimizations and observability tools.

o Understanding of CI/CD frameworks and platform governance principles, specifically in enterprise solutions like GitLab.

Interpersonal Skills:

o Proven ability to lead cross-functional initiatives and mentor teams on reliability best practices.

o Excellent problem-solving skills with a proactive approach to turning reliability needs into actionable solutions.

o Results-driven focus on measurable impact and continuous improvement.

Impact of the Role

Strategic Enhancement:

o By ensuring the reliability of core developer platforms such as GitLab and Backstage, the SRE contributes to Ericsson’s ability to deliver high-quality software faster and with consistent performance.

o Reliability efforts empower Ericsson’s global engineering teams to innovate freely while maintaining operational stability.

High-Impact Leadership:

o This role demands systems-level leadership to shape a robust site reliability culture and drive architectural innovations that prioritize resilience at scale.

o Implementing effective reliability frameworks will directly benefit Ericsson’s competitiveness in the telecom industry by enabling reliable delivery across high-demand development ecosystems.

Why join Ericsson?At Ericsson, you´ll have an outstanding opportunity. The chance to use your skills and imagination to push the boundaries of what´s possible. To build solutions never seen before to some of the world’s toughest problems. You´ll be challenged, but you won’t be alone. You´ll be joining a team of diverse innovators, all driven to go beyond the status quo to craft what comes next.

What happens once you apply?Click Here to find all you need to know about what our typical hiring process looks like.Encouraging a diverse and inclusive organization is core to our values at Ericsson, that's why we champion it in everything we do. We truly believe that by collaborating with people with different experiences we drive innovation, which is essential for our future growth. We encourage people from all backgrounds to apply and realize their full potential as part of our Ericsson team. Ericsson is proud to be an Equal Opportunity Employer. learn more.

Primary country and city: Ireland (IE) \|\| Athlone

Req ID: 780375

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.