Roles and Responsibilities:

Implementation

Build and optimize ETL/ELT processes leveraging Databricks' native capabilities to handle large volumes of structured and unstructured data from various sources

Implement data quality frameworks and monitoring solutions using Databricks data quality features to ensure data accuracy and reliability across all data products

Establish best practices for data governance, security, and compliance within the Databricks ecosystem and integrate with enterprise systems

Operational Responsibilities:

Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance across all Databricks workloads and clusters

Implement comprehensive logging, alerting, and monitoring systems using Databricks monitoring capabilities and integration with enterprise monitoring tools

Perform regular health checks on Databricks cluster performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively

Manage incident response procedures for Databricks pipeline failures, including root cause analysis, resolution, and post-incident reviews

Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment

Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency

Implement automated testing frameworks for Databricks-based data pipelines, including unit tests, integration tests, and data validation checks

Maintain comprehensive documentation for all Databricks operational procedures, runbooks, and troubleshooting guides

Coordinate scheduled maintenance windows and Databricks system upgrades with minimal business impact

Manage user access controls, workspace configurations, and security policies within Databricks environments

Monitor data lineage using Databricks Unity Catalog and maintain metadata management systems to support operational transparency and compliance requirements

Establish capacity planning processes to forecast Databricks infrastructure needs and manage cloud costs effectively

Collaboration \& Leadership:

Provide technical guidance and mentorship to junior team members on Databricks best practices and data engineering principles

Participate in on-call rotation for critical production systems with focus on Databricks platform stability

Lead operational reviews and contribute to continuous improvement initiatives for Databricks platform reliability and efficiency

Coordinate with infrastructure teams on Databricks cluster provisioning, network configurations, and security implementations

Requirements / Qualifications:

Education \& Experience

Degree in Computer Science or Computer Engineering

Minimum 8-10 years working experience in system operations compliance and management areas

Project hands-on experience specifically with Databricks platform (primary requirement)

project experience in cloud operations or cloud architecture

Must be cloud certified (AWS)

Core Technical Skills:

Expert-level proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration

Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs

Extensive experience with Delta Lake, including data versioning, time travel, and ACID transactions

Proficiency in Databricks Unity Catalog for data governance and metadata management

Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques

Strong knowledge of monitoring, incident management, and cloud cost control

Technology Stack Experience:

Databricks (primary and most critical skill)

AWS cloud services and architecture

IDMC (Informatica Data Management Cloud)

Tableau for data visualization

Oracle Database management

ML Ops practices within Databricks environment (Good to have)

STATA for statistical analysis is advantage (Good to have)

Amazon SageMaker integration with Databricks (Good to have)

DataRobot platform integration (Good to have)

Soft Skills \& Stakeholder Management:

Good interpersonal skills with the ability to work with different groups of stakeholders

Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision

Excellent communication skills for technical documentation and cross-team collaboration

Desirable Requirements:

Databricks certification (Associate or Professional level) - highly preferred

Exposure to hospital information/clinical systems is an added advantage

Understanding of DevOps practices and CI/CD pipelines for Databricks-based data engineering projects

Knowledge of ITIL frameworks and operational best practices

Renee Feng

EA License No: 11C5502

EAP No. R23111942

Data Engineer

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox