Job Title: GCP Data Engineer
Location: Toronto, ON – Hybrid
Contract Role
Job Description
We are seeking a highly skilled Senior Data Engineer with deep expertise in Google Cloud Platform (GCP), distributed data processing, and cloud-native data architectures. This role involves designing, building, optimizing, and maintaining scalable data pipelines and analytical platforms supporting enterprise‑grade workloads. The ideal candidate brings strong hands‑on experience in BigQuery, Dataflow, DataProc, Dataform, Cloud Composer (Airflow), PySpark, and end‑to‑end ELT/ETL frameworks, along with robust knowledge of metadata, lineage, data quality, and CI/CD automation.
Key Responsibilities
1. Data Engineering \& Architecture
Design and implement end‑to‑end data architectures on GCP, including data lakes, data marts, and warehouse models.
Build scalable batch and streaming pipelines using Dataflow, DataProc (Spark), Dataform, and Pub/Sub.
Architect low‑latency, high‑throughput processing solutions supporting advanced analytics and ML workloads.
Develop pre‑aggregated models, materialized views, and optimized analytical structures in BigQuery.
2. ETL/ELT Pipeline Development
Design, develop, test, and optimize ELT/ETL pipelines for structured and unstructured data.
Use Dataform and Cloud Composer (Airflow) for orchestration, dependency management, and metadata logging.
Implement best practices for ingestion, transformation, storage, and data access patterns.
3. Data Quality, Metadata \& Governance
Implement enterprise‑grade data quality checks using Great Expectations or custom Python frameworks.
Manage metadata, lineage tracking, data cataloging, and compliance with governance standards.
Ensure data integrity, schema enforcement, and security‑by‑design principles across all data pipelines.
4. Cloud Infrastructure \& DevOps
Build and automate cloud infrastructure using Terraform, Jenkins, GitLab CI, and IaC best practices.
Develop CI/CD workflows for pipeline deployments, testing gates, and operational automation.
Monitor pipelines using Cloud Monitoring \& Logging, optimizing for performance and cost.
5. Cross‑Functional Collaboration
Work closely with data scientists, analysts, platform engineering, and product owners to translate complex business needs into scalable data solutions.
Support legacy-to-GCP migration initiatives, including Hadoop and on‑premise workloads.
Enable advanced analytics and ML workloads through ML‑ready data pipelines.
6. Advanced Analytics \& ML Support
Support feature engineering and ML data preparation for Vertex AI, Gemini, HuggingFace, or other ML platforms.
Enable vector database workflows and generative AI data pipelines.
Required Technical Skills
Cloud \& Big Data
Google Cloud Platform: BigQuery, DataProc, Dataflow, Cloud Composer (Airflow), GCS, Cloud Run, EventArc
Distributed Computing: Apache Spark, PySpark, Kafka
Data Lake \& Lakehouse Architectures
Programming \& Tools
Python, SQL, Java
Git, Bitbucket, Jenkins, GitLab CI
Terraform (IaC)
REST APIs, FastAPI
Airflow DAG development
Data Engineering Competencies
Data modeling (OLTP/OLAP)
Data Warehousing
ELT/ETL pipelines
Streaming \& real‑time processing
Data profiling and validation
Metadata, lineage, quality management