Skip to main content

Databricks

This destination loads data into Databricks Delta tables using a Databricks SQL warehouse. Extract stages Parquet files in a Unity Catalog volume, then loads them into Delta tables using COPY INTO.

Setup Guide

What this destination expects

This destination writes Parquet files to a Unity Catalog volume and then loads them into Delta tables with COPY INTO through a Databricks SQL warehouse.

You need:

  1. A Databricks workspace URL
  2. A Databricks SQL warehouse ID
  3. A Databricks personal access token (PAT)
  4. A target Unity Catalog catalog and schema
  5. A Unity Catalog staging volume name (for example, staging)

Step 1 - Create a SQL warehouse

Open SQL Warehouses in your Databricks workspace and copy the warehouse ID for a running warehouse.

Step 2 - Choose the target catalog and schema

Choose the catalog and schema where Extract should create Delta tables.

If they do not exist yet, Extract will try to create them automatically during the first load, as long as your Databricks user has permission to create catalogs and schemas.

Example SQL:

CREATE CATALOG IF NOT EXISTS main;
CREATE SCHEMA IF NOT EXISTS main.extract;

Step 3 - Choose a staging volume

Extract stages Parquet files in a Unity Catalog volume before loading them into Delta tables.

If the configured volume does not exist yet, Extract will try to create it automatically during the first load, as long as your Databricks user has permission to create volumes.

Example SQL:

CREATE VOLUME IF NOT EXISTS workspace.extract.staging;

Extract derives the staging root automatically from the connector settings:

/Volumes/{catalog}/{schema}/{staging_volume_name}

For example, with:

  1. catalog = workspace
  2. schema = extract
  3. staging_volume_name = staging

Extract stages files under:

/Volumes/workspace/extract/staging

Step 4 - Create a personal access token

In the Databricks workspace, create a PAT and use it as the connector access token.

The token needs permission to:

  1. Use the selected SQL warehouse
  2. Create catalogs, schemas, and volumes if they do not already exist
  3. Create and update tables in the target catalog and schema
  4. Read and write files under the derived staging volume path

Step 5 - Configure the destination in Extract

Fill in:

  1. Workspace URL
  2. Access Token
  3. SQL Warehouse ID
  4. Catalog
  5. Schema
  6. Staging Volume Name
  7. Table Prefix (optional)

Authentication

This destination authenticates to Databricks using a personal access token (PAT).

  • Header used: Authorization: Bearer <token>
  • Keep the token scoped to the minimum permissions required for the target catalog/schema and staging volume.

Configuration reference

FieldTypeRequiredDescription
Workspace URLstringYour Databricks workspace URL (for example, https://dbc-xxxxxxxx-xxxx.cloud.databricks.com).
Access TokenstringDatabricks personal access token (PAT) used to authenticate API and SQL warehouse requests.
SQL Warehouse IDstringThe ID of the Databricks SQL warehouse used to run DDL/DML and COPY INTO.
CatalogstringUnity Catalog catalog where Extract will create and load tables.
SchemastringUnity Catalog schema where Extract will create and load tables.
Staging Volume NamestringUnity Catalog volume name used for staging Parquet files. The staging root is derived as /Volumes/{catalog}/{schema}/{staging_volume_name}.
Table PrefixstringoptionalPrefix applied to all destination table names (useful for namespacing multiple syncs into the same schema).

Data model and loading behavior

  • File format: Parquet (staged in the configured Unity Catalog volume)
  • Table format: Delta
  • Load mechanism: COPY INTO from the staged Parquet files into a temporary table, then data is merged/inserted into the final table depending on the selected load mode.

Extract may create the following automatically (if permissions allow):

  • Catalog and schema (if missing)
  • Staging volume (if missing)
  • Destination tables (if missing)

Streams

Each stream is written to a Delta table in:

{catalog}.{schema}.{table_prefix}{stream_table_name}

Notes:

  • Extract may create a per-run temporary table during loading (used to stage the COPY INTO results before applying the final write to the destination table).
  • Table and column names are sanitized/quoted as needed to be compatible with Databricks SQL and Unity Catalog.