Note: The job is a remote job and is open to candidates in USA. Vaniam Group is seeking an Independent Contractor, Data Engineer to enhance their Business Operations by building a reliable and scalable data foundation. The role involves centralizing operational data, developing automated pipelines, and transforming manual reporting into scheduled, high-trust data products to support analytics and decision-making.
Responsibilities
Stand up a scalable Databricks lakehouse to ingest, model, and serve business operations data (finance, resourcing, project delivery, CRM, marketing, and time tracking).
Design and maintain automated ELT/ETL pipelines that move data from SaaS tools, databases, and files into bronze/silver/gold layers.
Build the core semantic layer (cleaned, conformed, documented tables) that powers self-serve BI and executive dashboards.
Replace legacy/manual engagement and utilization reports with scheduled, monitored jobs and SLAs.
Partner with Business Operations, Finance, and People Operations leaders to define source-of-truth metrics (e.g., revenue, margin, utilization, velocity, pipeline, engagement health).
Lay groundwork for AI use cases (RAG over operational data, agentic processes, querying company data) by implementing robust lineage, metadata, and access controls.
Architecture & Modeling: Design lakehouse architecture, dimensional/medallion models, and data contracts across systems.
Pipeline Automation: Implement CI/CD for data (branching, PRs, jobs, environments), with observability and reproducibility.
Data Governance: Enforce PII/PHI handling, role-based access, auditability, and retention aligned to healthcare-adjacent standards.
Enablement: Document datasets, publish a data catalog, and enable self-serve usage via BI and SQL.
Reporting Modernization: Decommission manual spreadsheets and one-off extracts; consolidate to certified, scheduled outputs.
AI Readiness: Capture lineage/metadata and vector-friendly document stores to support future ML and RAG initiatives.
Skills
2+ years in data engineering or analytics engineering, including building production data pipelines at scale.
Expert with Databricks (Delta Lake, SQL, PySpark) and cloud data platforms (AWS or Azure).
Proficient with dbt and/or Delta Live Tables; strong SQL and data modeling fundamentals.
Experience orchestrating jobs (Airflow, Databricks Workflows, or equivalent)
Comfortable with PowerBI and semantic modeling for self-serve analytics.
Strong stakeholder skills. Can translate business needs into reliable data products and clear SLAs.
Familiarity with data governance (RBAC/ABAC, secrets management, token-based auth) and healthcare-adjacent compliance (e.g., HIPAA concepts) is a plus.
Databricks, Delta Lake, PySpark, SQL, dbt, REST/GraphQL APIs, Git/GitHub, Power BI/Tableau/Looker.
Company Overview
Vaniam Group offers consulting, scientific communication, expert engagement & insights-gathering services within the oncology field. It was founded in 2007, and is headquartered in Chicago, Illinois, USA, with a workforce of 201-500 employees. Its website is https://vaniamgroup.com/.