Hồ Chí Minh
Full-time
Zalo is looking for a Lead Data Engineer with 5+ years of experience, specializing in Big Data, AutoML, Feature Store, and Kubernetes. Proficiency in optimizing HDFS, building high-performance APIs, ensuring data privacy, security, and point-in-time correctness is essential. The candidate must possess the ability to lead a team, provide technical mentorship, coordinate cross-team efforts, and collaborate with major partners (Fiza, Adtima, VAS).
What you will do
1. Professional skills
Big Data \& Distributed Systems:
Proficient in Hadoop ecosystem (HDFS, YARN, Hive, Spark, Flink).
Storage \& processing optimization: data compression (Snappy* Zstandard), partitioning, bucketing, file format (ORC, Parquet).
HDFS administration: backup, cleanup, archiving, capacity planning.
AutoML \& MLOps:
Design and operate AutoEDA systems, auto-training, evaluation, and prediction at scale.
Deep understanding of end-to-end ML pipeline, automated feature engineering, model registry, serving.
Feature Store:
Build and operate a Feature Store with \>3,000 features, ensuring point-in-time correctness, low-latency serving.
Support batch and real-time ingestion, and consistency between online/offline stores.
API \& Middleware Development:
Develop high-throughput API (gRPC, REST) on Kubernetes (K8s), optimize latency \& scalability.
CI/CD, observability (Prometheus, Grafana, OpenTelemetry), canary/blue-green deployment.
Cloud \& Infra:
Proficient in at least 1 cloud (GCP/AWS/Azure): GCS/S3, BigQuery, Dataflow, Cloud Composer.
IaC (Terraform), container orchestration (K8s, Helm), service mesh (Istio – bonus).
2. Architecture \& design skills
Design scalable, fault-tolerant, observable systems.
Trade-off analysis: batch vs streaming, consistency vs availability, cost vs performance.
Data modeling: star schema, slowly changing dimensions, data vault (if needed).
Security \& Governance:
Data encryption at rest/in transit, access control (Ranger, Apache Atlas).
Comply with data privacy (GDPR, PDPA), anonymization, consent management.
3. Leadership \& Management Skills
Mentoring \& Knowledge Sharing:
1:1 coaching, code review, tech talk, writing internal documentation.
Building tech culture: best practices, engineering excellence.
Team management:
Recruitment, competency assessment, member development planning.
Assign tasks to each person's strengths.
Cross-functional Collaboration:
Work closely with DS, DE, Safety, Product, Partner teams.
Translate business requirements* technical solutions.
4. Soft Skills
Ownership \& Proactiveness: proactively detect bottlenecks, propose improvements.
Problem-Solving: handle production incidents, root cause analysis (RCA).
Business Acumen: clearly understand partner use-cases (Fiza, Adtima, VAS) to prioritize development.
Communication: present complex ideas in an easy-to-understand way to non-tech stakeholders.
5. Tools \& languages
Language: Python (expert), Scala/Java (bonus), SQL (complex query).
Framework: Airflow, dbt, Feast/KFP/TFX.
Monitoring: ELK stack, Jaeger, Prometheus + Grafana.
Versioning: Git, trunk-based development, semantic versioning.
What you will need
Candidates with 5+ years of experience in Data Engineering, priority is given to those who have held the position of Lead/Tech Lead.
Have built a system to process \>1TB/day or \>1K QPS API.
Have experience leading a team of 5+ members.
Priority is given to candidates who have worked with AutoML, Feature Store, DMP/CDP.