Remote
£ 45,000 - £ 55,000
Are you an engineer who thrives on solving complex reliability challenges across cloud platforms?
We’re looking for a Site Reliability Engineer who can combine strong technical capability with a pragmatic approach to automation, monitoring, and service delivery. You’ll help keep Tribal’s education-driven SaaS products highly available, scalable, and performant.
At Tribal
Tribal is a leading EdTech business providing market-leading software solutions to the global education market. We research, develop, and deliver the products, services, and solutions that education institutions worldwide rely on to support their core mission: educating students, delivering exceptional learning experiences, and achieving successful outcomes.
Our Platform Engineering function is at the heart of this, ensuring our systems are designed and maintained to the highest standards of reliability and security. As part of the SRE \& Operations team, you’ll play a key role in delivering Tribal’s products through the public cloud as SaaS services across AWS and Azure.
The Role
As a Site Reliability Engineer, you’ll design, build, and operate large-scale systems with an emphasis on reliability, efficiency, and automation. You’ll work across deployment, monitoring, and incident response to ensure our platforms stay healthy and our customers experience uninterrupted service.
You’ll be involved in:
Maintaining and improving production systems for availability, latency, and scalability
Supporting application deployment and configuration to production environments
Building or enhancing automation tools (Ansible, scripts, utilities)
Implementing and managing observability tools such as DataDog or New Relic
Analyzing logs and metrics to identify trends and improve reliability
Supporting incident response and performing root-cause analysis
Collaborating closely with engineering and customer teams to deliver proactive, preventative support
Participating in on-call and out-of-hours rotations in line with Tribal’s On-Call Policy
This is a full-time, fully remote UK-based role, with occasional national travel for team collaboration or customer engagements.
What you’ll bring
Strong experience with AWS (or Azure) environments
Solid knowledge of Linux, Apache, and PHP in a production context
Familiarity with automation/configuration tools such as Ansible
Experience with monitoring and logging platforms (e.g. DataDog, New Relic, Azure Monitor)
Good understanding of database fundamentals (SQL Server / Oracle)
Hands-on troubleshooting and problem-solving skills
Customer-facing experience with incident or service management tools (RemedyForce, ServiceNow)
Strong written and verbal communication skills, able to translate technical details clearly
Nice-to-have:
Experience coding or scripting (PowerShell, C#, .NET Core)
Understanding of CI/CD pipelines (Azure DevOps or similar)
ITIL Foundation or cloud certifications (AWS SysOps Administrator, AWS Solutions Architect)
Note to applicants:
We welcome applications from individuals who already have the right to work in the UK.
As an equal opportunity employer, Tribal celebrate diversity and are committed to creating an inclusive environment for all employees. We make sure that our recruitment and selection processes never discriminate based upon any protected characteristics and actively welcome applications from all groups, not least those underrepresented in the tech sector.
Note to all applicants - Tribal reserve the right to close an advertisement to applications ahead of the advertised closure date. For this reason, shortlisting may take place prior to the closing date on some occasions. With this in mind, please do not hesitate to apply early.