👨🏻‍💻 postech.work

MLOPS Tech Lead

Healwell AI Inc • 🌐 In Person

In Person Posted 3 days, 8 hours ago

Job Description

Job Description There are over 7000 rare diseases identified, affecting over 300 million patients worldwide and 1 in 12 patients in Canada. Many of these patients remain undiagnosed and unaware, resulting in a poor quality of life and potentially serious consequences. Healwell AI (HWAI) (TSX:AIDX), is a leader in AI-enabled clinical intelligence for rare diseases and specialty conditions. Through our proprietary clinical intelligence platform and deep analytical tools, HWAI allows physicians to quickly understand complex, high-risk patients and place them on the right care pathways leading to better outcomes for patients, their families, and the healthcare system.

HWAI is looking for We are seeking an experienced MLOps Tech Lead to architect our next-generation AI infrastructure and lead a talented team of engineers. In this pivotal role, you will bridge the gap between Data Science, Cloud Engineering, and DevOps. You will not only be hands-on with our Azure/Databricks stack but will also set the technical vision, establish engineering standards, and ensure our AI platforms are secure, scalable, and cost-efficient. You will own the roadmap for our MLOps maturity, moving us from manual execution to fully automated, observable, and resilient AI systems. You will have the opportunity to enhance your technical leadership skills while contributing to impactful projects in the healthcare space.

Responsibilities The successful candidate will work in a multifaceted role encompassing Cloud Architect, Cloud Security, and DevOps/MLOps responsibilities

Lead, mentor, and grow a team of MLOps and Cloud Engineers; conduct code reviews, facilitate technical design sessions, and foster a culture of engineering excellence.

Define the high-level architecture for our end-to-end ML platform on Azure, making critical decisions on "build vs. buy" for tooling and infrastructure.

Oversee the Terraform codebase; implement modular, reusable infrastructure patterns and enforce state management policies to prevent drift.

Own the reliability (SRE) of machine learning systems. Define SLAs/SLOs for model inference and data pipelines, and lead root cause analysis (RCA) for critical incidents.

Manage cloud budgets (FinOps) for compute/Databricks usage and enforce rigorous security postures (IAM, network isolation, private endpoints) ensuring compliance with industry standards

Evolve our CI/CD pipelines from simple automation to advanced deployment strategies (Blue/Green, Canary releases, Shadow deployment) for ML models.

Deploy and maintain cloud-based ML models in production, ensuring performance and scalability

Design, deploy, and manage scalable, secure, and highly available cloud infrastructure on Azure, utilizing infrastructure as code (IaC) principles.

Build monitoring systems for data quality, model performance, and pipeline health

Collaborate with cross-functional teams to define problems and develop solutions

Develop and maintain documentation for cloud architecture, processes, and systems

Diagnose and resolve issues related to application and model performance, pipeline failures, and infrastructure problems.

Required Qualifications

Bachelor’s degree in computer science, Engineering, or related field

7+ years of total experience in DevOps, Cloud Engineering, or Software Engineering.

3+ years specifically focused on MLOps or Data Engineering at a production scale.

2+ years in a technical leadership or mentoring role (Team Lead, Principal Engineer, etc.).

Deep proficiency with Azure cloud and cloud-native services

Proficiency in Python and shell scripting

Hands-on experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)

Advanced mastery of Terraform

Deep hands-on experience with Databricks (MLflow, Spark, Unity Catalog)

Proven experience with orchestration tools (Dagster preferred)

Knowledge of Postgres or equivalent database management

Experience with containerization, infrastructure as code, and DevOps/MLOps practices

Strong problem-solving skills and ability to work independently and collaboratively

Preferred Qualifications

Certifications like Azure Solutions Architect Expert or DevOps Engineer Expert are desirable

Relevant certifications in security domains.

What You'll Work With

Data Platform: Databricks (Spark, Delta Lake) + Weaviate vector store

Orchestration: Dagster for pipeline management and scheduling

Cloud: Azure services for compute, storage, and ML services

Languages: Python, shell

Tools: Docker, Kubernetes, Terraform, Git, CI/CD pipelines

Monitoring: Custom dashboards, alerting systems, and model performance tracking

Culture \& Work Environment

Communication: We value open and honest communication. Regular check-ins and team meetings ensure everyone is aligned and informed.

Transparency: Our decision-making processes are transparent, encouraging input from all team members. Your ideas and feedback will be valued.

Promptness: We maintain a fast-paced work environment and expect team members to be prompt in delivering work and meeting deadlines.

Guidance: You will be supported and guided by our VP of Technology, who will provide mentorship and direction throughout your co-op experience.

What We Offer

Hands-on experience with real-world data challenges in the medical field.

Opportunities to expand your technical skill set and work with advanced AI tools.

A collaborative team environment that fosters learning and innovation.

We look forward to receiving your application and hope to welcome you to the HWAI team!

HWAI is an equal opportunity employer that welcomes all applicants including persons with disabilities, visible minorities, women, and aboriginals. HWAI will provide reasonable accommodation to qualified job applicants with a disability, on request, and will notify successful applicants of policies relating to the accommodation of employees with disabilities. We would like to thank all applicants for your interest in HWAI, but please note that only successful candidates will be contacted.

You can learn more about HWAI at https://healwell.ai

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.