We are looking for an experienced and highly skilled Hadoop Data Engineer to join our dynamic team. The ideal candidate will have hands-on expertise in developing optimized data pipelines using Python, PySpark, Scala, Spark-SQL, Hive , and other big data technologies. You will be responsible for translating complex business and technical requirements into efficient data pipelines and ensuring high-quality code delivery through collaboration and code reviews.
Roles \& Responsibilities:
Data Transformation \& Pipeline Development:
Design and implement optimized data pipelines using PySpark, Python, Scala, and Spark-SQL .
Build complex data transformation logic and ensure data ingestion from source systems to Data Lakes (Hive, HBase, Parquet).
Produce unit tests for Spark transformations and helper methods.
Collaboration \& Communication:
Work closely with Business Analysts to review test results and obtain sign-offs.
Prepare comprehensive design and operational documentation for future reference.
Code Quality \& Review:
Conduct peer code reviews and act as a gatekeeper for quality checks.
Ensure quality and efficiency in the delivery of code through pair programming and collaboration.
Production Deployment:
Ensure smooth production deployments and perform post-deployment verification.
Technical Expertise:
Provide hands-on coding and support in a highly collaborative environment .
Contribute to development, automation, and continuous improvement practices.
System Knowledge:
Strong understanding of data structures, data manipulation, distributed processing , and application development .
Exposure to technologies like Kafka , Spark Streaming , and ML is a plus.
RDBMS \& Database Management:
Hands-on experience with RDBMS technologies (MariaDB, SQL Server, MySQL, Oracle).
Knowledge of PLSQL and stored procedures is an added advantage.
Other Responsibilities:
Exposure to TWS jobs for scheduling.
Knowledge and experience in Hadoop tech stack , Cloudera Distribution , and CI/CD pipelines using Git, Jenkins .
Experience with Agile Methodologies and DevOps practices.
Technical Requirements:
Experience: 6-9.5 years of experience in Hadoop , Spark , PySpark , Scala , Hive , Spark-SQL , Python , Impala , CI/CD , and Git .
Strong understanding of Data Warehousing Methodology and Change Data Capture (CDC).
In-depth knowledge of Hadoop \& Spark ecosystems with hands-on experience in PySpark and Hadoop technologies.
Proficiency in working with RDBMS such as MariaDB , SQL Server , MySQL , or Oracle .
Experience with stored procedures and TWS job scheduling .
Solid experience with Enterprise Data Architectures and Data Models .
Background in Core Banking or Finance domains is preferred; experience in AML (Anti-Money Laundering) domain is a plus.
Skills \& Qualifications:
Strong hands-on coding skills in Python , PySpark , Scala , Spark-SQL .
Proficient in Hadoop ecosystem (Hive, HBase, etc.).
Knowledge of CI/CD , Agile , and DevOps methodologies.
Good understanding of data integration , data pipelines , and distributed data systems .
Experience with Oracle , PLSQL , and working with large-scale databases.
Strong analytical and problem-solving skills, with an ability to troubleshoot complex data issues.