Senior SRE
Location: Galway, Ireland
About Our Company – OneTouch Health
OneTouch Health is a fast-growing, ambitious, and award-winning healthcare software company transforming how care is delivered across the UK, Ireland, and beyond. Our platform supports thousands of residential, homecare, and community care providers, enabling safer care, stronger compliance, and better outcomes for tens of thousands of service users every day. With over a decade of deep sector expertise, OneTouch Health is trusted by hundreds of care organisations to streamline operations, improve quality of care, and empower frontline staff.
Now is an exciting time to join OneTouch Health. Following strong organic growth and significant private-equity investment, we are entering the next phase of our journey with ambitious expansion plans, major platform enhancements, and an accelerated roadmap in AI-powered care management.
At OneTouch Health we build systems that make care safer, smarter, and more connected. Our engineering and SRE teams are modernising our platform to deliver greater reliability, scalability, and visibility—while simplifying and accelerating the development lifecycle across a rapidly evolving product suite.
We are unifying our acquired platforms, enhancing our residential care capabilities, and laying the foundations for a next-generation system of intelligence that supports carers through automation, insights, and seamless workflows.
This is an exceptional moment to join a collaborative, mission-driven team that values innovation, ownership, and continuous improvement. You’ll work in an environment that supports professional growth while making a real impact on the future of health and social care.
Responsibilities – What will I be working on?
Develop tools for automating fault recovery, for developing monitoring and for providing recommendations to avoid future outages.
Design input into our next generation platform.
Automation tools for development and QA.
Responsible for monitoring and observability modernization across our platforms with Prometheus / Graphana.
Instrument existing code bases to expand our white box monitoring.
Measure reliability and help set goals with development, support and sales to improve those metrics.
Skills – What we need you to bring to the table
Degree in Computer Science/IT, or equivalent.
Demonstrable experience with monitoring systems.
Development experience: preferably PHP and Python
Experience with Terraform.
Passion for automation.
Previous experience with tools for build pipelines.
Excellent attention to detail and strong communication and organisational skills.
Ability to work well within a team in addition to working remotely.
Responsibilities – What will I be working on?
Build and maintain automation to improve platform reliability, reduce manual operational work, and accelerate fault detection and recovery across our multi-product environment.
Contribute to the design and delivery of our next-generation unified care platform, including infrastructure, observability, deployment architecture, and systems consolidation.
Develop automation tools to support Engineering, QA, and DevOps workflows across our PHP/Laravel and C#/.NET product suite.
Lead the modernisation of monitoring and observability across all platforms using Prometheus, Grafana, and a future logging solution (Loki, ELK, or OpenSearch—under evaluation).
Instrument existing PHP/Laravel and C#/.NET services to expand white-box monitoring and improve telemetry coverage and service-level insights.
Enhance and evolve our CI/CD pipelines using GitHub and Jenkins, improving automation, deployment reliability, and testing workflows.
Support our transition from a mixed cloud environment to a consolidated AWS-first architecture, contributing to infrastructure modernisation, security hardening, and cost optimisation.
Play a key role in the early stages of our containerisation journey, shaping how we leverage Docker and future orchestration options.
Partner with Engineering, Support, and Product to define, measure, and improve reliability metrics (SLIs/SLOs/SLAs) across the organisation.
Contribute to stronger incident response processes, root cause analysis, and long-term reliability improvements.
Skills – What we need you to bring to the table
Degree in Computer Science, Engineering, or equivalent practical experience.
Demonstrable experience working with monitoring and observability systems at scale.
Strong development or scripting skills ideally PHP (Laravel) and Python, with bonus points for experience in C#/.NET.
Experience with Terraform and cloud-native infrastructure, ideally with AWS.
A passion for automation, reliability engineering, and reducing operational toil.
Experience building and improving CI/CD pipelines using GitHub and/or Jenkins.
Understanding of Docker/containerisation principles, or strong interest in contributing to our future container strategy.
Excellent attention to detail, strong communication, and solid organisational skills.
Ability to collaborate effectively within a distributed engineering team.
Strong leadership aptitude for future team-lead roles
Benefits – What will you get in return?
Opportunity to have an impact within a Private Equity funded, High Growth Company operating in the Healthcare domain, delivering next generation software as a service platform.
We encourage contribution and decision-making ideas are listened to and encouraged. Experts are allowed and encouraged to make decisions.
A great emphasis on teamwork, communication and keeping everyone up to date on our progress as a company.
Join a growing team of strong players with the opportunity to learn and grow your knowledge and career.
Hybrid and remote working opportunities.
Competitive compensation.
Health insurance contribution
Company pension scheme and vacation allowance.
Company Sick Pay.
Death in service benefit
Income Protection scheme.
Great Sports and Social club with lots of events for the team throughout the year