Keyrus is an international consulting firm, specializing in the integration of data intelligence and Digital solutions. With over 3,500 employees spread across 25 countries, Keyrus continues to deliver on such projects to a wide range of clients from various industries including but not limited to Banking/Finance, Healthcare/pharmaceuticals, FMCG, Oil \& Gas, and more.
As part of Keyrus’ solution delivery, we are also in a position to recruit and place technical consultants to complement on existing client projects with their expertise. As such, we seek innovative and agile people to support ambitious and forthcoming technological challenges.
We are seeking a skilled and forward-thinking
Data Engineer
with strong experience in
Databricks
and exposure to
Generative AI (GenAI)
projects. In this role, you will build and optimize data pipelines that serve as the foundation for our GenAI-powered solutions.
This is a unique opportunity to sit at the intersection of
modern data engineering and cutting-edge AI innovation
, helping to bring scalable GenAI solutions to life.
Key Responsibilities
Design, build, and maintain scalable data pipelines and ETL processes using
Databricks
,
Apache Spark
, and
Delta Lake
.
Develop and optimize data ingestion workflows for structured and unstructured sources (e.g., text, PDFs, APIs, logs).
Collaborate with AI/ML engineers to supply clean, ready-to-use data for
LLM fine-tuning
,
prompt engineering
, and
embedding generation
.
Build and monitor data workflows using
MLflow
,
Unity Catalog
, and
Databricks Workflows
.
Work with GenAI frameworks like
LangChain
,
LlamaIndex
,
OpenAI
, or
Azure OpenAI
to integrate and operationalize LLM use cases.
Partner with stakeholders (Data Scientists, Product Managers, ML Engineers) to ensure data readiness for GenAI product development.
Required Skills \& Experience
2+ years of experience as a
Data Engineer
or similar role.
Strong hands-on experience with
Databricks
(SQL, Spark, Delta Lake, MLflow).
Proven ability to build production-grade
data pipelines
and manage
large-scale data processing
.
Exposure to
Generative AI
projects, such as: working with
LLMs
(OpenAI, Hugging Face, Cohere), building pipelines for
RAG
,
embeddings
, or
chatbots, m
anaging vector databases and LLM-related infrastructure
Strong proficiency in
Python
and SQL.