Data Engineer – Real-Time Stream Processing (Onsite, Hong Kong)
Mainland China or HK based candidate only
We’re looking for a hands-on
Data Engineer
with 5–7 years of experience to join a top tier crypto HFT prop trading firm as a sole contributor, working with software engineer team to design and build high-performance, real-time data pipelines in a modern cloud-native environment. If you're passionate about stream processing, scalable architecture, and turning complex business needs into robust data solutions—this role is for you.
What You’ll Do
Build and maintain
real-time data pipelines
using
Apache Flink (PyFlink)
across an end-to-end stack:
EC2 → Vector → AWS MSK → PyFlink → S3 → ClickHouse
Design
stream processing systems
with advanced features like watermarking, windowing, exactly-once semantics, and state management
Deploy and manage
AWS infrastructure
, including
Managed Flink, MSK, S3, IAM, and CloudWatch
, with a focus on performance, partitioning, and observability
Optimize
SQL queries
(Flink/ClickHouse), troubleshoot production issues (e.g., data skew, latency), and build
Grafana dashboards
for pipeline monitoring
Implement
data quality frameworks
, support schema evolution, and translate business requirements into scalable, maintainable data architectures
Own DevOps practices: manage
JAR dependencies
, containerize services with
Docker
, deploy on
Kubernetes
, and maintain clear technical documentation
What We’re Looking For
Bachelor’s degree in Computer Science, Software Engineering, or a related field
Strong experience with
Apache Flink / PyFlink
(watermarks, state, windows) and/or
Spark Streaming
Proficiency in
Python
(pandas, polars, boto3, pyarrow) and
SQL
(Flink SQL, ClickHouse)
Deep understanding of
Kafka / AWS MSK
and message streaming concepts (topics, partitions, consumer groups)
Hands-on use of
AWS services
: Managed Flink, S3, MSK, CloudWatch, IAM
Experience with
containerization
(Docker, Kubernetes) and monitoring via
Grafana
Solid troubleshooting skills for production data systems (latency, consistency, resource tuning)
Nice-to-Have
Workflow orchestration with
Airflow
Infrastructure-as-Code (
Terraform
, Ansible)
CI/CD pipelines, testing frameworks, and build tools (Maven/Gradle)
Knowledge of
ClickHouse table engines
and performance tuning
Exposure to data governance: lineage, metadata, data modeling
Logistics
Location
: Onsite in
Hong Kong
(no relocation support at this time)
Experience level
: 5–7 years (we’re not considering candidates with significantly more seniority)