We are looking for a
Python Developer (Data Engineering)
to design, build, and support scalable data pipelines and analytics platforms within a large enterprise banking environment. The role involves end-to-end data engineering across batch and near-real-time processing, strong collaboration with business and IT stakeholders, and adherence to high standards of quality, governance, and security.
Key Responsibilities
Develop robust data ingestion, transformation, and loading (ETL/ELT) processes across batch and near-real-time workflows
Implement distributed data processing using Apache Spark (PySpark / Scala) for large-scale transformations and analytics
Design and maintain logical and physical data models (dimensional/star schemas, data vault, wide tables) optimized for reporting and analytics
Write and optimize SQL and HiveQL queries; manage tables, partitions, and storage formats
Schedule, monitor, and support data pipelines using Control-M, ensuring SLA adherence and timely delivery
Perform performance tuning of Spark jobs, SQL/Hive queries, and storage strategies for scalability and cost efficiency
Implement data validation, reconciliation, and lineage using checks, unit tests, and metadata frameworks
Build operational dashboards and alerts, investigate failures, and drive root-cause analysis and remediation
Maintain comprehensive runbooks, architecture diagrams, data dictionaries, and coding standards
Apply data privacy, access control, and security best practices in line with enterprise and regulatory requirements
Drive continuous service and process improvements
Prepare unit test cases and work closely with Testing teams during SIT and UAT
Package and migrate code across environments (DEV → QA → PROD) with proper audit trails and governance