Summary
We are seeking a skilled Data Engineer to design, develop, and maintain our data infrastructure, with a specific focus on supporting business intelligence and analytics initiatives. The ideal candidate will build robust data pipelines, create scalable data models, and ensure the accuracy and accessibility of data for reporting and strategic decision-making. This role requires a strong blend of technical expertise and the ability to collaborate with stakeholders to meet business requirements.
Responsibilities
Design, develop, and maintain scalable ETL/ELT pipelines to ingest, transform, and load data from various sources into a central repository or data warehouse.
Build and optimize dimensional data models and multi-dimensional databases (OLAP) to support business intelligence and analytics platforms.
Ensure data quality, consistency, and accuracy through robust validation, cleaning, and monitoring processes.
Collaborate with data analysts, data scientists, and other stakeholders to understand data requirements and provide them with reliable datasets.
Develop and implement data governance, security, and compliance best practices.
Optimize data systems for performance, scalability, and cost-efficiency.
Work with stakeholders to help them with data-related technical issues and support their data infrastructure needs.
Automate manual processes to improve efficiency and scalability.
Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field.
Proven experience as a Data Engineer or similar role with a focus on BI.
Strong proficiency in SQL for complex queries and data manipulation.
Experience with ETL/ELT tools and processes.
Experience building and maintaining data warehouses and data lakes.
Familiarity with cloud platform like AWS and their data services.
Experience with programming languages such as Python for data pipeline development.
Excellent problem-solving, communication, and collaboration skills.
Technical Skills
Highly proficient in SQL and DBT
Have good experience with Python (pandas, scikit-learn) and who know some data vault 2.0 and data warehousing (e.g slowly changing dimensions).
AWS services such as S3, Glue, Athena, Redshift, Airflow