We are looking for a highly skilled DevOps Engineer with strong experience in DevSecOps and MLOps / LLMOps to design, automate, and secure our development and deployment pipelines.You will play a critical role in building scalable, secure, and production-ready infrastructure to support both traditional applications and machine learning / LLM workloads.This role demands a strong understanding of Kubernetes, CI/CD pipelines, infrastructure-as-code, model lifecycle management, and cloud-native security practices.
DevOps \& Infrastructure
Design, implement, and manage
scalable, fault-tolerant infrastructure
on
cloud or hybrid environments
(AWS / GCP / Azure / Hetzner / Bare metal).
Develop and maintain
CI/CD pipelines
using tools like
GitHub Actions
,
GitLab CI
,
Jenkins
, or
ArgoCD
.
Manage
containerized workloads
using
Kubernetes
,
Helm
, and
Docker
.
Implement
infrastructure as code (IaC)
with
Terraform / OpenTofu / Terragrunt
.
Monitor system performance, availability, and cost efficiency using
Prometheus, Grafana, ELK, or Loki
.
DevSecOps
Integrate
security automation
into CI/CD pipelines (SAST, DAST, SCA, dependency scanning).
Implement
policy as code
using
OPA / Conftest
and enforce
RBAC / IAM
best practices.
Manage
secrets and credentials
using tools like
Vault
,
Sealed Secrets
, or
External Secrets Operator
.
Set up
vulnerability scanning and runtime protection
(e.g., Trivy, Falco, Aqua Security).
Define
security baselines
for infrastructure, network, and containers.
MLOps / LLMOps
Collaborate with ML and data teams to
operationalize model training, evaluation, and deployment
.
Build
automated pipelines
for
data preprocessing, model training, and inference deployment
using tools like
Kubeflow, MLflow, or Airflow
.
Manage
feature stores, model registries, and monitoring
for drift, latency, and accuracy.
Support
LLM pipelines
— prompt orchestration, fine-tuning, vector DB integrations, and
retrieval-augmented generation (RAG)
.
Optimize
GPU-based workloads
and manage
distributed training / inference
infrastructure.
Required Skills \& Qualifications
Languages:
Python, Bash, Go (preferred)
IaC Tools:
Terraform / OpenTofu / Terragrunt
CI/CD:
GitHub Actions, GitLab CI, Jenkins, ArgoCD
Containers:
Docker, Kubernetes, Helm
Monitoring:
Prometheus, Grafana, Loki, ELK
Security:
Trivy, Falco, Vault, OPA, Snyk
MLOps Tools:
MLflow, Kubeflow, Airflow, Weights \& Biases
Cloud Platforms:
AWS / GCP / Azure / Hetzner
Databases:
PostgreSQL, Redis, Vector DBs (Milvus, Pinecone, Weaviate, Qdrant)
Nice to Have
Experience with
GPU orchestration
on Kubernetes (NVIDIA operator, KServe).
Exposure to
LLM frameworks
(LangChain, LlamaIndex, vLLM, Ollama).
Knowledge of
data governance and compliance
(GDPR, SOC2).
Experience with
self-hosted runners
,
GitOps
, or
multi-cluster management
.
Familiarity with
event-driven systems
(Kafka, NATS, or Redis Streams).
What We Offer
Opportunity to work on
challenging, large-scale systems
with real-world impact.
Collaborative team culture with focus on
learning and innovation
.
Competitive compensation and growth opportunities.