👨🏻‍💻 postech.work

Senior Cloud OPS Engineer

Sparagus • 🌐 In Person

In Person Posted 3 days, 14 hours ago

Job Description

Job Description – Senior Cloud Native OPS / Cloud (ML)Ops Engineer

Role Description

As a senior Cloud Native OPS Engineer, you have over 5 years of technical system expertise to perform technical cloud engineering services:

You configure AWS services and work with Terraform scripting (infrastructure as a code), AWS networking/gateways, AWS Landing Zone setup, lambda and container services;

You evaluate and translate requirements into design;

You evaluate design benefits and trade-offs;

You validate design compliance and support deployment of the design to ensure the requirements are met;

You use development tools to efficiently solve technical or business challenges, incl. technology evolution, capacity management, and performance optimization;

You innovate to present new ideas which improve an existing system/process/service;

You maintain knowledge of existing technology documents via technical writing;

You perform (complex) incident resolution and root cause analyses;

On duty call for the systems you are responsible for, can be required.

Core Competences

Next to a proven experience in system software and cloud infrastructure, you have the following core competences:

Adaptive

Analytical thinking

Collaborating

Flexible

IT Infrastructure

Result driven

Software development

Qualification

Assessment

Must Have

ICT knowledge

Application programming interface / API gateway

Generative Artificial Intelligence (GenAI)

Large Language Models (LLM)

Language knowledge

English

Soft skills

Collaborating and Team player

Technical skills

Apache Airflow

Apache Spark

AWS CICD tooling

AWS Kinesis Stream

AWS Lambda

AWS S3

AWS SageMaker

Docker

GIThub/BitBucket

Python

Terraform

Nice to Have

Language knowledge

Dutch

Technical skills

MLflow

Detailed Job Description

As a Cloud (ML)Ops Engineer, you’ll work at the intersection of cloud infrastructure, DevOps, and machine learning operations. Together with your team, you’ll help build a reliable, scalable, and secure platform that supports data scientists and analysts throughout their entire workflow.

This includes:

hosting a multi-user Jupyter environment and a cloud IDE;

providing frameworks for training, storing, serving, and monitoring custom models, primarily for high throughput batch processing;

exposing models via APIs for low latency request-response use cases;

enabling Generative AI initiatives.

Your Responsibilities

Designing and building cloud-native platform services for AI models and data pipelines.

Collaborating with colleagues and stakeholders across countries to develop technical solutions.

Managing infrastructure using tools like Terraform, Docker, and Kubernetes on AWS.

Automating workflows for data processing and model lifecycle management (Airflow, Spark, and Python)

Ensuring platform reliability, performance, and cost-efficiency.

Supporting colleagues in using the platform, including onboarding and troubleshooting.

Contributing to the evolution of our MLOps practices.

What do we expect from you?

You have a strong interest in cloud, data and AI, and eager to learn about new developments in the field.

Education or experience

Master’s degree in ICT, Engineering Sciences or Business Engineering with a focus on Informatics, or equivalent experience.

Technical skills

Proficient in Python and the broader data science ecosystem.

Experience with cloud infrastructure (preferably AWS).

Familiar with Docker and Kubernetes.

Skilled in infrastructure as code (Terraform).

Experience with CI/CD tools like Jenkins or GitHub Actions.

Knowledge of big data tools such as Spark.

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.