We are looking for an experienced and highly skilled Hadoop Data Engineer to join our dynamic team. The ideal candidate will have hands-on expertise in developing optimized data pipelines using Python, PySpark, Scala, Spark-SQL, Hive , and other big data technologies. You will be responsible for translating complex business and technical requirements into efficient data pipelines and ensuring high-quality code delivery through collaboration and code reviews.

Roles \& Responsibilities:

Data Transformation \& Pipeline Development:

Design and implement optimized data pipelines using PySpark, Python, Scala, and Spark-SQL .

Build complex data transformation logic and ensure data ingestion from source systems to Data Lakes (Hive, HBase, Parquet).

Produce unit tests for Spark transformations and helper methods.

Collaboration \& Communication:

Work closely with Business Analysts to review test results and obtain sign-offs.

Prepare comprehensive design and operational documentation for future reference.

Code Quality \& Review:

Conduct peer code reviews and act as a gatekeeper for quality checks.

Ensure quality and efficiency in the delivery of code through pair programming and collaboration.

Production Deployment:

Ensure smooth production deployments and perform post-deployment verification.

Technical Expertise:

Provide hands-on coding and support in a highly collaborative environment .

Contribute to development, automation, and continuous improvement practices.

System Knowledge:

Strong understanding of data structures, data manipulation, distributed processing , and application development .

Exposure to technologies like Kafka , Spark Streaming , and ML is a plus.

RDBMS \& Database Management:

Hands-on experience with RDBMS technologies (MariaDB, SQL Server, MySQL, Oracle).

Knowledge of PLSQL and stored procedures is an added advantage.

Other Responsibilities:

Exposure to TWS jobs for scheduling.

Knowledge and experience in Hadoop tech stack , Cloudera Distribution , and CI/CD pipelines using Git, Jenkins .

Experience with Agile Methodologies and DevOps practices.

Technical Requirements:

Experience: 6-9.5 years of experience in Hadoop , Spark , PySpark , Scala , Hive , Spark-SQL , Python , Impala , CI/CD , and Git .

Strong understanding of Data Warehousing Methodology and Change Data Capture (CDC).

In-depth knowledge of Hadoop \& Spark ecosystems with hands-on experience in PySpark and Hadoop technologies.

Proficiency in working with RDBMS such as MariaDB , SQL Server , MySQL , or Oracle .

Experience with stored procedures and TWS job scheduling .

Solid experience with Enterprise Data Architectures and Data Models .

Background in Core Banking or Finance domains is preferred; experience in AML (Anti-Money Laundering) domain is a plus.

Skills \& Qualifications:

Strong hands-on coding skills in Python , PySpark , Scala , Spark-SQL .

Proficient in Hadoop ecosystem (Hive, HBase, etc.).

Knowledge of CI/CD , Agile , and DevOps methodologies.

Good understanding of data integration , data pipelines , and distributed data systems .

Experience with Oracle , PLSQL , and working with large-scale databases.

Strong analytical and problem-solving skills, with an ability to troubleshoot complex data issues.

Hadoop Data Engineer

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox