ClickHouse

The ClickHouse connector can extract data from and load data into a ClickHouse database over HTTP(S).

As a destination, it creates tables as needed and supports different extraction modes that determine how tables are created and how rows are written.
As a source, it extracts data from one or more configured SQL datasets and supports full refresh and incremental extraction.

Prerequisites

A reachable ClickHouse server (self-hosted or managed).
Network access from the platform to your ClickHouse HTTP(S) endpoint.
A ClickHouse user with permissions appropriate to your use case:
- Destination: create databases/tables (if you want the connector to create them) and insert/select.
- Source: select permissions on the referenced database(s)/table(s).

Authentication

The connector authenticates with:

Host and Port for the ClickHouse HTTP interface
Username and Password (optional; defaults to empty)
Optional TLS (HTTPS)
Optional Custom root certificate (PEM) for private CAs / self-signed certs
Optional SSH tunnel (to reach a private ClickHouse endpoint)

Setup

Identify your ClickHouse HTTP endpoint:
- Default HTTP port is commonly 8123 (varies by deployment).
Ensure the user has appropriate permissions on the target database (destination) or source objects (source).
If using TLS with a private CA, obtain the CA certificate in PEM format.
If your ClickHouse instance is only reachable via a bastion/private network, configure an SSH tunnel.

Configuration

Common connection fields (source and destination)

Field	Type	Required	Description
`host`	string	✅	ClickHouse host name or IP address.
`port`	number	✅	ClickHouse HTTP(S) port.
`tls`	boolean	✅	If `true`, the connector uses `https://`. If `false`, it uses `http://`.
`username`	string	❌	ClickHouse username. Defaults to empty.
`password`	string	❌	ClickHouse password. Defaults to empty.
`database`	string	❌	Database name. For destination, used as the target database to create/use tables in (if set). For source, used as the default database context (if set).
`custom_root_cert`	string (PEM)	❌	PEM-encoded root certificate to trust when connecting via TLS (useful for private CAs).

SSH tunnel (optional)

When enabled, the connector opens an SSH tunnel to the ClickHouse host/port via an SSH server (bastion).

Field	Type	Required	Description
`use_ssh_tunnel`	boolean	✅	If `true`, connect to ClickHouse through an SSH tunnel.
`ssh_host`	string	✅*	SSH server host. Required when `use_ssh_tunnel=true`.
`ssh_port`	number	✅*	SSH server port (typically `22`). Required when `use_ssh_tunnel=true`.
`ssh_user`	string	✅*	SSH username. Required when `use_ssh_tunnel=true`.
`ssh_key.private_key`	string	✅*	Private key contents used for SSH authentication. Required when `use_ssh_tunnel=true`.

* Required only when use_ssh_tunnel=true.

Destination-only fields

The destination writes records by first staging them to S3 and then loading into ClickHouse.

Field	Type	Required	Description
`s3_bucket`	string	✅	S3 bucket used for staging files before loading into ClickHouse.
`access_key_id`	string	✅	AWS access key ID for the staging bucket.
`secret_access_key`	string	✅	AWS secret access key for the staging bucket.
`table_prefix`	string	❌	Prefix applied to all created destination tables. Defaults to empty.

Source-only fields

The source extracts from one or more configured datasets. Each dataset is defined by a name and a SQL query.

Field	Type	Required	Description
`datasets`	array	✅	List of datasets to extract. Each dataset becomes a stream.

Each dataset supports:

Field	Type	Required	Description
`name`	string	✅	Dataset name. Used as the stream name.
`sql`	string	✅	SQL query to execute for this dataset.
`primary_key`	string	❌	Comma-separated list of primary key column(s) (for example: `"id"` or `"account_id, id"`). Required for incremental extraction.
`incremental_timestamp_field`	string	❌	Timestamp column used as the incremental cursor. Required for incremental extraction.

Destination behavior

Table creation and schema behavior

The connector creates destination tables with ClickHouse MergeTree-family engines depending on the extraction mode.

Nullability rules

When creating tables, columns are Nullable by default. However, some columns must be non-nullable because ClickHouse requires certain expressions/keys to be non-nullable:

Columns used in PARTITION BY
Columns used in ORDER BY
The versioning column used by ReplacingMergeTree(...) (when applicable)

The connector automatically marks the following columns as non-nullable:

Partition extraction: all partition keys
Incremental changes extraction:
- all primary key columns, and
- the configured timestamp_field (if it is not already part of the primary key)
Full refresh: no special non-nullable columns are enforced (all columns default to Nullable)

Engine selection by extraction mode

Full refresh
- Engine: MergeTree
- Ordering: ORDER BY tuple()
Partition
- Engine: MergeTree
- Partitioning: PARTITION BY (...)
  - If a partition key is typed/formatted as date or date-time, the connector partitions by toYYYYMMDD(<key>).
- Ordering: ORDER BY (<partition keys>)
Incremental changes
- Requires a primary key to be defined for the stream.
- Engine (Upsert load mode): ReplacingMergeTree(<timestamp_field>)
- Ordering: ORDER BY (<primary key columns>)

Load modes

Upsert
- Supported for Incremental changes extraction.
- Implemented using ReplacingMergeTree(timestamp_field) with ORDER BY (primary_key...).
History
- Not supported for ClickHouse in this connector.

Source behavior

Dataset-based extraction

Each configured dataset is extracted by executing its sql query. The result set is emitted as records for the dataset’s stream.

Incremental extraction

For incremental extraction, configure both:

primary_key (one or more columns), and
incremental_timestamp_field (a timestamp column)

The connector uses the timestamp field as the cursor to fetch new/updated rows over time. Ensure your dataset SQL includes the primary key and timestamp field in the selected columns.

Streams

Destination streams

Streams correspond to the entities/tables you configure in your pipeline. For each stream, the connector:

Creates a table if it does not exist.
Applies the engine and key strategy based on the stream’s extraction mode and primary key configuration.
Loads records into the destination table.

Primary key requirements (destination)

Incremental changes extraction requires a primary key. If a stream is configured for incremental changes without a primary key, the connector will fail the run.

Partition key typing (destination)

For partition extraction, partition keys that are typed/formatted as date or date-time are partitioned using toYYYYMMDD(key) to produce stable daily partitions. Other types are used as-is.

Source streams

Each entry in datasets becomes a stream:

Stream name: datasets[i].name
Stream schema: derived from the dataset query result columns
Incremental behavior: enabled when

Prerequisites​

Authentication​

Setup​

Configuration​

Common connection fields (source and destination)​

SSH tunnel (optional)​

Destination-only fields​

Source-only fields​

Destination behavior​

Table creation and schema behavior​

Nullability rules​

Engine selection by extraction mode​

Load modes​

Source behavior​

Dataset-based extraction​

Incremental extraction​

Streams​

Destination streams​

Primary key requirements (destination)​

Partition key typing (destination)​

Source streams​