Technical Expertise:
Solid experience in Python programming, particularly using data manipulation and processing libraries such as Pandas, NumPy, and Apache Spark.
Practical experience in designing, developing, and maintaining robust data ingestion pipelines.
Demonstrated ability to optimize code efficiency, database queries, and system performance.
Hands-on experience with open-source data frameworks like Apache Spark, Apache Kafka, and Apache Airflow.
Strong proficiency in SQL, including advanced query development and performance tuning.
Good understanding of distributed computing principles and big data ecosystems.
Familiar with version control tools (Git) and CI/CD automation pipelines.
Experience working with relational databases such as PostgreSQL, MySQL, or equivalent platforms.
Skilled in using containerization technologies including Docker and Kubernetes.
Experience with workflow orchestration tools like Apache Airflow or Dagster.
Strong grasp of data warehousing methodologies, including dimensional modelling and schema design.
Understanding of cloud infrastructure management, preferably using Infrastructure-as-Code (IaC) tools and approaches.
Familiar with streaming data pipelines and real-time analytics solutions.