👨🏻‍💻 postech.work

Senior Data Engineer (AWS) | multiple headcounts

Gravitas Recruitment Group (Global) Ltd • 🌐 In Person

In Person Posted 5 days, 5 hours ago

Job Description

Responsibilities:

Pipeline Engineering (PySpark on AWS):

Design, implement, and optimize batch/near-real-time ETL/ELT pipelines using PySpark on services such as AWS Glue; ensure code quality, reusability, and performance at scale.

Workflow Orchestration:

Build and maintain DAGs with Apache Airflow (e.g., AWS MWAA) to schedule, monitor, and recover workflows; implement alerting, retries, and SLA handling.

Data Storage \& Modeling:

Design efficient schemas (dimensional and/or data-vault/lakehouse), manage RDS PostgreSQL performance (indexes, partitioning, VACUUM/ANALYZE), and integrate with S3/Athena where appropriate.

Reliability \& Observability:

Instrument pipelines with metrics/logs; tune PySpark jobs (partitions, shuffle strategies, broadcast joins, caching) and optimize Glue job DPUs/cost.

Security \& Governance:

Apply IAM least-privilege, encryption (KMS), tagging, and data masking/pseudonymization; collaborate with data governance on lineage, metadata, and quality controls.

CI/CD \& Automation:

Use Git-based workflows; automate build/test/deploy of data jobs and infrastructure changes (IaC where applicable).

Stakeholder Collaboration:

Engage product owners, analysts, and downstream consumers; translate requirements into robust data solutions; document runbooks and provide clear status updates.

Mentorship \& Standards:

Review code, coach engineers, and contribute to engineering standards and best practices.

Required Qualifications:

Hands-on PySpark expertise (DataFrame API, performance tuning, job debugging).

Strong knowledge of AWS data services, particularly Glue, Airflow (operator/DAG design), RDS PostgreSQL, S3, and CloudWatch.

Proficiency in SQL and Python for data engineering (testing, packaging, dependency management).

Experience operating production pipelines: monitoring, incident response, and RCA.

Communication: Excellent written and verbal English; able to explain complex topics clearly to technical and non-technical audiences.

Preferred / Nice to Have:

Financial Services domain experience (e.g., regulatory data controls, privacy, compliance).

Cantonese speaking skills (is a plus).

Lakehouse patterns (e.g., Apache Iceberg), query engines (Athena/Presto), and data cataloging.

Performance tuning in PostgreSQL (EXPLAIN/ANALYZE, indexing strategies).

Experience with AWS services such as DMS, Lake Formation, and Glue Data Catalog.

DevOps tooling (GitLab/Jenkins).

Key Success Metrics:

Pipeline delivery on schedule and within budget (SLA adherence, MTTR/MTTA).

Data quality and reliability (validation coverage, defect escape rate).

Efficiency and cost optimization (DPU hours, storage/query costs).

Stakeholder satisfaction and adoption of delivered datasets.

Contribution to standards, documentation quality, and team mentorship.

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.