**Please note digital good is
not
affiliated with any government, nor official public sector entity. It is a social enterprise privately held.**
Company Description
Canadian civic tech Start-up building a public grade
data lake
to host and structure legislative transcripts from across Canada — starting with Hansard from federal, provincial, and territorial governments. You’ll be at the center of a bold effort to make legislative language machine-readable, searchable, and useful to researchers, journalists, and the public.
Role Description
This is a 6month contract for a *part-time remote
Data Engineer
located anywhere in Canada.
Amended to part-time commitment due to some data awaiting provincial data use agreement.
The Data Engineer will be responsible for design, build, and maintainance of a modern data lake infrastructure (cloud-native or hybrid)
Design and deploy a scalable data lake architecture to ingest Hansard transcripts from 13+ jurisdictions (web, PDF, HTML, RSS, etc.)
Write clean, resilient
ETL pipelines
to parse and normalize PDFs, HTML pages, XML feeds
OCR layers or scraped content (where APIs aren’t available)
Apply document classification or tagging logic (e.g. date, bill, speaker, topic)
Maintain metadata consistency across jurisdictions
Optimize storage (e.g., Parquet, Delta Lake) for future querying by AI and policy tools
Build toward public API endpoints or civic dashboards
Ensure version tracking (for evolving bills) and reproducibility
Qualifications
Entry level proficiency in Data Engineering and Data Modeling
AWS (S3, Glue, Athena), GCP (BigQuery, Cloud Functions), or Azure
Knowledge of data formats JSON, CSV, Markdown
Ontario College Diploma in Computer Science, Engineering, or a related field
Experience in the technology or digital sector is a bonus
Please apply by October 20th, 2025 at 5:00p.m. EST.
Accommodations are available upon request as per the accessibility laws. Please forward questions, or request to digitalgood@icloud.com. We are not affiliated with the government of Canada.