ClickHouse
Destination Documentation: ClickHouse Documentation
ClickHouse is a column-oriented analytics database that Extract can load into over HTTP(S). The destination stages stream data in S3, creates tables in ClickHouse as needed, and then applies the configured load mode to merge staged data into the final tables. You can connect directly over HTTP(S), trust a custom root certificate, or reach a private deployment through an SSH tunnel.
Prerequisites
- A reachable ClickHouse HTTP(S) endpoint.
- A ClickHouse user that can create and manage the target database/tables, plus insert and read data during loads.
- An S3 bucket for staging files before ClickHouse imports them.
- AWS credentials with read/write access to that staging bucket.
- Network access:
- from Extract to your ClickHouse endpoint, and
- from your ClickHouse server to the staging S3 bucket used during loads.
- The AWS region for the staging S3 bucket must be discoverable (for example, via the bucket’s region configuration /
GetBucketLocation).
Setup Guide
Step 1 - Create or choose an S3 staging bucket
The ClickHouse destination first writes stream data to S3 and then asks ClickHouse to import those staged files.
Prepare:
- An S3 bucket dedicated to Extract staging, or an existing bucket you are comfortable using for staged load files.
- An IAM user/access key pair with permission to write staged files and read them back during import.
Step 2 - Create a ClickHouse service user
Create or select a ClickHouse user for Extract. The user should have permissions to:
- connect over the HTTP interface,
- create the target database if needed,
- create/drop temporary tables,
- insert into destination tables,
- select from tables as part of load operations, and
- alter tables when partition-based replace operations are used.
If you plan to isolate Extract data, create a dedicated destination database for it.
Step 3 (Optional) - Enable TLS
If your ClickHouse deployment uses HTTPS, enable Use TLS in the connector settings.
If your deployment uses a private CA or self-signed certificate, also provide Custom Root CA Certificate in PEM format.
Step 4 (Optional) - Configure an SSH tunnel
If the ClickHouse endpoint is only reachable through a bastion or private network, enable Use SSH Tunnel and provide:
- SSH Host
- SSH Port
- SSH Username
- SSH Public Key generated by Extract
Add the generated public key to ~/.ssh/authorized_keys on the SSH server.
Step 5 - Configure the account connector in Extract
Provide the account-level settings:
- Hostname - ClickHouse server hostname or IP.
- Port - Default HTTP(S) port is commonly
8123. - Username - ClickHouse username for Extract.
- Password - ClickHouse password for Extract.
- Use TLS - Enable if your server requires HTTPS.
- Custom Root CA Certificate - Optional PEM certificate for private trust chains.
- Use SSH Tunnel and SSH settings - Optional for private deployments.
- S3 Bucket Name - Bucket used for staging files.
- IAM User Access Key Id - Access key for the staging bucket.
- IAM User Secret Access Key - Secret key for the staging bucket.
Save the account connector after testing connectivity.
Step 6 - Configure connection-level settings
When you create a connection that targets ClickHouse, configure:
- Database - Target ClickHouse database for the connection.
- Table Name Prefix - Optional prefix applied to all tables created by this connection.
- Schema Migration Policy - Controls how Extract reacts to upstream schema changes; see Known limitations below for current ClickHouse schema-change support.
- Table Name Override - Optional stream-level override for the final table name.
Configuration Parameters
Account connector parameters
- Hostname - ClickHouse server hostname or IP address.
- Port - ClickHouse HTTP(S) port.
- Username - ClickHouse username. If not provided, defaults to an empty string.
- Password - ClickHouse password. If not provided, defaults to an empty string.
- Use TLS - Enables HTTPS for the connection.
- Custom Root CA Certificate - Optional PEM-encoded root certificate for private CAs or self-signed certs.
- Use SSH Tunnel - Connect through an SSH tunnel instead of directly.
- SSH Host / SSH Port / SSH Username / SSH Public Key - Required when SSH tunneling is enabled.
- S3 Bucket Name - Bucket used to stage load files.
- IAM User Access Key Id - AWS access key used for staging.
- IAM User Secret Access Key - AWS secret key used for staging.
Connection-level parameters
- Database - Target ClickHouse database. If provided, the connector will create the database if it does not already exist.
- Table Name Prefix - Optional prefix prepended to each destination table.
- Schema Migration Policy - Controls whether new streams/fields are auto-activated or deprecated.
Stream-level parameters
- Table Name Override - Optional final table name override for an individual stream.
Connection Settings
When ClickHouse is the destination, the source stream configuration determines how tables are created and updated in ClickHouse.
Common settings to review:
- Load Mode - ClickHouse supports Append, Replace, and Upsert.
- Primary Key - Required for incremental upsert flows.
- Incremental Timestamp Field - Required for ClickHouse upserts because the destination uses it as the version column.
- Partition Keys - Used when the stream is configured for partition-based extraction/loading.
Data Contract Information
Singular creates ClickHouse tables with nullable columns by default.
Some columns are created as non-nullable to satisfy ClickHouse requirements for keys used in table engines:
- Partition mode: columns used in
PARTITION BYare created as non-nullable. - Incremental changes mode: primary key columns and the configured
timestamp_field(used for versioning/order) are created as non-nullable.
Additionally, array-typed columns (for example, Array(String)) are always created as non-nullable in ClickHouse (they are not wrapped in Nullable(...)).
Additional Information
How data loading works
Extract loads data into ClickHouse with the following flow:
- Extract - Records are pulled from the source stream.
- Stage to S3 - Extract writes the stream payload to JSONL files in your configured S3 bucket (uploaded as GZIP-compressed objects).
- Create tables - Extract creates the destination table and a temporary staging table when needed.
- Import from S3 - ClickHouse reads the staged JSONL file from S3 using its
s3(...)table function. - Apply load mode - Extract appends, replaces, or upserts into the final table.
- Cleanup - Temporary tables are removed after the load completes.
Supported load modes
- Append - Inserts staged rows into the destination table.
- Replace - Loads data into a temporary table and swaps it into place for full-refresh style replacement.
- Upsert - Supported for incremental streams with a primary key and timestamp field.
Table engines and schema behavior
ClickHouse tables are created with MergeTree-family engines based on the stream extraction mode:
- Full refresh -
MergeTree ORDER BY tuple() - Partition -
MergeTree PARTITION BY (...) ORDER BY (...) - Incremental changes -
ReplacingMergeTree(timestamp_field) ORDER BY (primary_key...)
Additional behavior:
- Columns are nullable by default.
- Array-typed columns are created as non-nullable (ClickHouse does not allow
Nullable(Array(...))). - Partition keys are forced to be non-nullable.
- For incremental changes, the primary key columns and timestamp field are forced to be non-nullable.
- Date/date-time partition keys are partitioned using
toYYYYMMDD(...).
Known limitations
- History load mode is not supported.
- Soft delete semantics are not supported.
- Schema change operations for add column, deprecate column, and rename column are not implemented yet.
- Upsert requires a stream primary key and incremental timestamp field.
- The destination depends on S3 staging, so both Extract and your ClickHouse deployment must be able to access the staging bucket.
Networking notes
- Default ClickHouse HTTP(S) traffic typically uses port
8123. - Use TLS when your deployment requires encrypted transport.
- Use an SSH tunnel when ClickHouse is only reachable through a private network or bastion host.