Introduction
IBM Trusteer has an opportunity for a senior SRE; We are seeking an experienced and talented individual that is passionate about infrastructure and interested in working with cutting edge technology in a global large-scale environment.
Your Role And Responsibilities
This role is responsible for designing, deploying, and maintaining our infrastructure and CI/CD pipelines. The work includes designing, building and deploying high availability, robust, resilient and supportable products while streamline and automate our software delivery and infrastructure operations in a large-scale SaaS environment. With a focus on the infrastructure and operational elements of designing and deploying large scale solutions, the SRE must ensure the infrastructure is highly available, have sufficient capacity in place and are fully resilient across multiple data centers and cloud architectures.
Responsibilities
This role works as part of an operations team to design, deploy and support 24x7x365 operations with day-to-day responsibilities that include:
Develop and maintain automation scripts and tools using Python, Groovy \& bash.
Create and manage our pipelines, CI/CD infrastructure and automations jobs in Jenkins.
Manage Development/QA/Production environments with Terraform.
Integrate, create and maintain monitoring for various flows and components of the system to ensure systems’ reliability and observability.
Occasional off-shift availability to resolve Production issues.
Work closely with other members of the SRE and R\&D teams.
Responsible for system performance and reliability.
Ensure proactive engagement in Incident Management process, working with Operational teams to minimize the impact of database outages.
Required Technical And Professional Expertise
Several years experience as SRE
Several years experience in scripting and automation using Python or similar language
Experience in Cloud-related environment - AWS preferred
Experience in CI\CD tools like Jenkins
Experience with IaC and configuration management tools like Terraform and Ansible
Experience with docker-based environments and Kubernetes orchestration (GitOps and ArgoCD are an advantage)
Experience working in production environments requiring 99.99% availability
Excellent communication skills including the ability to effectively communicate with technical, non-technical employees and vendors.
Strong problem solving, testing, and network troubleshooting skills.
Bachelor’s degree in computer science, Information Technology, or a related field.