At Cyber Insight, we are building the next generation of AI-driven platforms for IT security and risk management. Our mission is to empower companies to gain deep insights into their IT landscapes and proactively mitigate risks in an increasingly complex digital world.
As a fast-growing startup, we combine expertise in cybersecurity, data engineering, and artificial intelligence to deliver solutions that automate risk assessments, predict potential threats, and help organizations stay ahead of evolving cyber risks. Our team thrives on innovation, collaboration, and a shared passion for making a real impact in the cybersecurity space.
We are looking for a
hands-on Data Engineer
who is passionate about building reliable, scalable, and secure data systems. You’ll help shape our data architecture and pipelines that feed our AI models and risk assessment engines — including the crucial task of mapping vulnerabilities (CVEs) to specific software and system components.
Tasks
Design, build, and maintain
data pipelines
and
ETL/ELT workflows
across GCP and on-prem environments.
Ingest and process
cybersecurity-relevant data sources
such as CVE feeds, software inventories, vulnerability databases, and event logs.
Develop and maintain transformation logic and data models linking vulnerabilities (CVEs) to affected software and assets.
Implement and automate
data validation
,
consistency checks
, and
quality assurance
using tools like
Great Expectations
or
Deequ
.
Collaborate with AI and graph modeling teams to structure and prepare data for
threat intelligence
and
risk quantification models
.
Manage and optimize data storage using
BigQuery
,
PostgreSQL
, and
Cloud Storage
, ensuring scalability and performance.
Automate data workflows and testing through
CI/CD pipelines
(GitHub Actions, GCP Cloud Build, Jenkins).
Implement monitoring and observability for pipelines using
Prometheus
,
Grafana
, and
OpenTelemetry
.
Apply a
security-focused mindset
in data handling, ensuring safe ingestion, processing, and access control of sensitive datasets.
Requirements 3+ years of experience in
data engineering
,
backend data systems
, or
cybersecurity data processing
.
Strong Python skills and experience with
pandas
,
PySpark
, or
Dask
for large-scale data manipulation.
Proven experience with
data orchestration and transformation frameworks
(Airflow, dbt, or Dagster).
Solid understanding of
data modeling
,
data warehousing
, and
SQL optimization and ETL pipelines (Kafka)
.
Familiarity with
CVE data structures
, vulnerability databases (e.g. NVD, CPE, CWE), or security telemetry.
Experience integrating heterogeneous data sources (APIs, CSV, JSON, XML, or event streams).
Knowledge of
GCP data tools
(BigQuery, Pub/Sub, Dataflow, Cloud Functions) or equivalent in Azure/AWS.
Experience with
containerized environments
(Docker, Kubernetes) and infrastructure automation (Terraform or Pulumi).
Understanding of
data testing
,
validation
, and
observability practices
in production pipelines.
A structured and security-aware approach to building data products that support
AI-driven risk analysis
.
Nice to Have
Experience working with
graph databases
(Neo4j, ArangoDB) or
ontology-based data modeling
.
Familiarity with
ML pipelines
(Vertex AI Pipelines, MLflow, or Kubeflow).
Understanding of
software composition analysis
(SCA) or vulnerability scanning outputs (e.g. Trivy, Syft).
Background in
threat intelligence
,
risk scoring
, or
cyber risk quantification
.
Experience in
multi-cloud or hybrid setups
(GCP, Azure, on-prem).
Benefits
Freedom to design and shape a modern, secure data platform from the ground up.
A collaborative startup environment where your work directly supports AI and cybersecurity products.
Flexible working hours and remote-friendly setup.
Exposure to cutting-edge technologies in
AI
,
data engineering
, and
cyber risk analytics
.
Competitive salary and benefits tailored to your experience.
We are looking forward to meet you!