Job Description – Senior Cloud Native OPS / Cloud (ML)Ops Engineer
Role Description
As a senior Cloud Native OPS Engineer, you have over 5 years of technical system expertise to perform technical cloud engineering services:
You configure AWS services and work with Terraform scripting (infrastructure as a code), AWS networking/gateways, AWS Landing Zone setup, lambda and container services;
You evaluate and translate requirements into design;
You evaluate design benefits and trade-offs;
You validate design compliance and support deployment of the design to ensure the requirements are met;
You use development tools to efficiently solve technical or business challenges, incl. technology evolution, capacity management, and performance optimization;
You innovate to present new ideas which improve an existing system/process/service;
You maintain knowledge of existing technology documents via technical writing;
You perform (complex) incident resolution and root cause analyses;
On duty call for the systems you are responsible for, can be required.
Core Competences
Next to a proven experience in system software and cloud infrastructure, you have the following core competences:
Adaptive
Analytical thinking
Collaborating
Flexible
IT Infrastructure
Result driven
Software development
Qualification
Assessment
Must Have
ICT knowledge
Application programming interface / API gateway
Generative Artificial Intelligence (GenAI)
Large Language Models (LLM)
Language knowledge
English
Soft skills
Collaborating and Team player
Technical skills
Apache Airflow
Apache Spark
AWS CICD tooling
AWS Kinesis Stream
AWS Lambda
AWS S3
AWS SageMaker
Docker
GIThub/BitBucket
Python
Terraform
Nice to Have
Language knowledge
Dutch
Technical skills
MLflow
Detailed Job Description
As a Cloud (ML)Ops Engineer, you’ll work at the intersection of cloud infrastructure, DevOps, and machine learning operations. Together with your team, you’ll help build a reliable, scalable, and secure platform that supports data scientists and analysts throughout their entire workflow.
This includes:
hosting a multi-user Jupyter environment and a cloud IDE;
providing frameworks for training, storing, serving, and monitoring custom models, primarily for high throughput batch processing;
exposing models via APIs for low latency request-response use cases;
enabling Generative AI initiatives.
Your Responsibilities
Designing and building cloud-native platform services for AI models and data pipelines.
Collaborating with colleagues and stakeholders across countries to develop technical solutions.
Managing infrastructure using tools like Terraform, Docker, and Kubernetes on AWS.
Automating workflows for data processing and model lifecycle management (Airflow, Spark, and Python)
Ensuring platform reliability, performance, and cost-efficiency.
Supporting colleagues in using the platform, including onboarding and troubleshooting.
Contributing to the evolution of our MLOps practices.
What do we expect from you?
You have a strong interest in cloud, data and AI, and eager to learn about new developments in the field.
Education or experience
Master’s degree in ICT, Engineering Sciences or Business Engineering with a focus on Informatics, or equivalent experience.
Technical skills
Proficient in Python and the broader data science ecosystem.
Experience with cloud infrastructure (preferably AWS).
Familiar with Docker and Kubernetes.
Skilled in infrastructure as code (Terraform).
Experience with CI/CD tools like Jenkins or GitHub Actions.
Knowledge of big data tools such as Spark.