Position Overview
Job Title: Data Engineer (6-Month Contract)
Department: Services
Location: Singapore
Reporting To: Contract
Duration: 6 months
Tookitaki is seeking a Data Engineer (Contract) with strong expertise in Apache Spark and Cloudera (CDP) to support high-priority data initiatives for our AI-driven financial crime prevention platforms—FinCense and the AFC Ecosystem. This role will contribute to building and maintaining robust data pipelines that ensure accurate, scalable, and production-grade data processing across real-time and batch workflows.
Position Purpose
This role is designed to support data engineering efforts during a critical delivery phase. The engineer will work closely with platform, product, and services teams to enable high quality data ingestion, transformation, and availability across Tookitaki’s compliance modules. The work done in this role directly contributes to risk scoring, transaction monitoring, and fraud detection systems for global banks and fintech clients.
Key Responsibilities
1. Spark-Based Data Development
Design and optimize batch and streaming pipelines using Apache Spark.
Debug performance and memory issues in Spark-based ETL processes.
2. Cloudera Data Platform (CDP) Handling
Leverage HDFS, Hive, Impala/Trino, and HBase within Cloudera to support data workflows.
Collaborate with infra teams to ensure CDP cluster reliability and schema alignment.
3. Pipeline Development \& Monitoring
Build ingestion pipelines using Kafka, Hive, Spark for large-scale financial datasets.
Support Airflow-based orchestration and ensure production SLAs are met.
4. Data Validation \& Debugging
Write and optimize SQL queries to validate data accuracy and ingestion success.
Assist in tracing pipeline issues and executing backfills if necessary.
5. Cross-Functional Collaboration
Coordinate with data scientists, DevOps, and service teams to support platform releases.
Deliver on strict project timelines tied to active client deployments.
Qualifications and Skills
Education
Bachelor’s/Master’s in Computer Science, Engineering, or related discipline.
Experience
5–8 years as a Data Engineer, with at least 2 years in Spark-heavy environments.
Prior experience working with Cloudera Data Platform (CDP) in production.
Technical Expertise
Apache Spark (Core, SQL, Tuning)
Cloudera CDP: Hive, HDFS, HBase, Impala/Trino
Kafka, Airflow, SQL
Python and Bash scripting
Familiarity with Linux-based environments
Exposure to AWS is a plus
Soft Skills
Strong problem-solving mindset
Ability to thrive in contractual, delivery-driven settings
Clear communication and documentation habits
Focus on execution, quality, and speed
Key Competencies
Data Pipeline Ownership
Big Data Architecture
Execution Agility in Project Timelines
Collaborative Implementation Mindset
Operational Readiness Success Metrics
On-time delivery of assigned pipeline components
Stability and performance of Spark workflows in UAT and production
Accuracy of data validation and transformation logic
Cross-team satisfaction with deliverables in rollout sprints