Role: Azure SRE
Location: Toronto- Hybrid-
Hire Type: Contract
Job Description: "Looking for a Site Reliability Engineer to work for the Digital Line of Business for our account An ideal candidate should be the one that has experience with scripting Microsoft Azure cloud platform
Below is the detailed Job Description
Monitoring and Alerting
Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users
Incident Response
Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service
Automation
Automate repetitive tasks and processes to improve efficiency and reduce manual effort
Performance Optimization
Identify and address performance bottlenecks to ensure systems run efficiently and effectively
Infrastructure Management
Manage and maintain the underlying infrastructure including servers networks and cloud resources
Capacity Planning
Plan for future capacity needs to ensure systems can handle anticipated workloads
Release Engineering
Develop and maintain processes for deploying software updates and releases
Collaboration
Work closely with developers operations teams and other stakeholders to ensure system reliability and availability
Documentation
Maintain clear and concise documentation of systems processes and procedures
Continuous Improvement
Identify areas for improvement and implement changes to enhance system reliability and performance
Skills and Qualifications
Cloud Platform Microsoft Azure
Excellent knowledge of AKS
Monitoring tools Dynatrace Splunk Grafana
Operating System Windows Linux
Scripting Shell Scripting Python Power Shell
Database MySQL Oracle SQL database management
Container Services Kubernetes Docker Helm
Understanding of Camunda is preferable