About the job
The National Jobs-Skills Data Office (NJSDO) serves three core functions:
- Data and Algorithms Innovation and R\&D: Undertake development of new data models
and algorithms to serve whole of government’s jobs-skills needs;
- Jobs-Skills Product Management, Development and Delivery: Manage and enhance JS
product design \& delivery, which includes UX/UI design and end-to-end product life cycle
management;
- Data Management and Operations: Centrally manage data quality, data models, data
infrastructure to support internal and external users
As a Data Engineer, you will be a key member of the Data Management and Operations
team, ensuring a robust data infrastructure, upholding data quality and data governance
rules and efficient functioning of data pipelines. You will work closely with stakeholders
across the division and vendors to ingest data and translate data needs into the relevant
tables. The Data Engineer will also work closely with AI Engineers and Machine Learning
Engineers to deploy data science models into production.
What you will be working on
You will be involved in a range of tasks including the following:
Infrastructure Architecture \& Cloud Management: Design and manage robust,
scalable data infrastructure utilising AWS and other cloud platforms based on
project requirements. Proactively explore and evaluate innovative data
engineering tools to enhance infrastructure capabilities. Continuously
recommend and implement improvements to data infrastructure based on
emerging technologies.
Data Pipeline Development \& Implementation: Design and implement efficient
data models and pipelines for ingestion, processing, and distribution of large?scale datasets. Ensure high data quality and availability across all data processing
workflows. Align data flows across various systems with consistent schema and
governed access models.Collaborate with external vendors to enable secure data
exchange through APIs, including the development of API Swagger specifications
required for vendors to build and integrate their API endpoints.
Data Quality \& Standards Management: Lead comprehensive data quality
initiatives establishing standards for accuracy and reliability across all systems.
Diagnose and resolve data pipeline issues while contributing to incident response
and post-mortem reviews. Maintain rigorous quality control processes throughout
the data lifecycle.
AI \& Machine Learning Collaboration: Collaborate closely with AI engineers to
provide optimised data solutions for machine learning projects. Emphasise
seamless data flow and accessibility for AI model development and deployment.
Ensure data infrastructure supports advanced analytics and machine learning
requirements.
Documentation \& Knowledge Management: Develop and maintain
comprehensive documentation on data architecture, procedures, and
management practices. Ensure clarity and consistency of documentation across
all teams and projects. Create accessible resources that support effective data
management practices.
Automation \& Integration Solutions: Leverage innovative tools and architectures
to automate common, repeatable, and error-prone data preparation tasks.
Minimise manual processes whilst improving overall productivity and efficiency.
Implement automated data integration solutions that reduce operational
overhead.
Governance \& Stakeholder Collaboration: Work closely with data governance
teams to vet and promote high-quality content for governed reuse. Engage
proactively with cross-functional teams and business stakeholders to refine
requirements and co-design solutions. Support creation and maintenance of
curated data catalogues for organisational use.
Training \& Continuous Improvement: Facilitate knowledge sharing and
technical training sessions on data management best practices and tools.
Enhance data competency across staff through structured learning programmes.
Establish regular feedback loops with data consumers to refine and optimise
pipelines for seamless production deployment.
What we are looking for
Proficiency in data engineering practices (including versioning, release
management, deployment of datasets, agile \& related software tools) and building
scalable data pipelines.
Proficiency in Python and SQL; experience in AI/ML model development and
deployment a plus.
Experience in working with large and multiple datasets and data warehouses.
Independent contributor with ability to collaborate and work effectively within the
team.
Strong analytical, conceptualisation and problem-solving skills.
Excellent written and verbal communication skills