Skip to main content

Postgres (CDC)

Overview

Extract's Postgres (CDC) connector captures row-level changes from Postgres logical replication (pgoutput) and streams inserts, updates, deletes, and truncates into Extract.

Use this connector when you need ongoing change capture. If you only need scheduled query-based extraction, use the standard Postgres source connector.

How It Works

  1. Extract validates your Postgres CDC settings and server capabilities.
  2. Optional initial snapshot reads current table state.
  3. Extract tails WAL changes from a logical replication slot using pgoutput.
  4. Extract applies table filters and writes records to selected streams.

Compatibility and Requirements

  • Postgres 15+ (primary/standard CDC).
  • wal_level=logical.
  • max_replication_slots >= 1.
  • max_wal_senders >= 1.
  • A database user with login, table read permissions, and replication capability.
  • Network access from Extract to the database host (directly or via SSH tunnel).

You can verify the required server settings with:

SHOW server_version_num;
SHOW wal_level;
SHOW max_replication_slots;
SHOW max_wal_senders;

Step 1 - Create a Dedicated CDC User

CREATE USER extract_cdc WITH PASSWORD 'replace-with-strong-password';
GRANT CONNECT ON DATABASE <database> TO extract_cdc;
GRANT USAGE ON SCHEMA <schema> TO extract_cdc;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema> TO extract_cdc;
GRANT SELECT ON FUTURE TABLES IN SCHEMA <schema> TO extract_cdc;
ALTER ROLE extract_cdc WITH REPLICATION;

If you use auto-managed publications, ensure this user can create/alter publications.

Step 2 - Decide Publication and Slot Ownership

Use defaults:

  • create_publication=true
  • create_replication_slot=true
  • publication_name=extract_cdc_pub
  • replication_slot_name=extract_cdc_slot

Extract will create missing objects and keep publication tables aligned with selected streams.

Option B: Manage them manually

Set both creation flags to false and create objects yourself.

Example:

CREATE PUBLICATION extract_cdc_pub FOR TABLE public.orders, public.customers;
SELECT * FROM pg_create_logical_replication_slot('extract_cdc_slot', 'pgoutput', false);

If manual mode is enabled and objects are missing, the connector fails fast at runtime.

Step 3 - Configure Replica Identity

For safe update/delete CDC behavior, tracked tables should use REPLICA IDENTITY FULL.

Manual approach:

ALTER TABLE public.orders REPLICA IDENTITY FULL;

Or set connector parameter replica_identity_override=FULL to apply it automatically on selected tables.

Notes:

  • Only FULL is supported.
  • replica_identity_override cannot be applied when read_replica=true.

Step 4 - Configure Connector Parameters in Extract

Account Connector Parameters

  • host
  • port (default 5432)
  • username
  • password
  • database
  • tls (default false)
  • use_ssh_tunnel and SSH fields (ssh_host, ssh_port, ssh_user, ssh_key)

Connection Parameters

  • replication_slot_name
  • publication_name
  • create_publication
  • create_replication_slot
  • slot_temporary
  • read_replica
  • initial_snapshot
  • resnapshot
  • lag_warning_consecutive_runs (default 3)
  • include_schemas, exclude_schemas
  • include_tables, exclude_tables
  • replica_identity_override

Read Replica Mode

Set read_replica=true only when you are intentionally decoding from a replica.

Requirements and constraints:

  • Replica host must support standby logical decoding (Postgres 16+).
  • create_publication=false and create_replication_slot=false are required.
  • Publication/slot must already exist and be valid for the target database.
  • replica_identity_override is not applied in read-replica mode.

Snapshot and Resnapshot Behavior

  • initial_snapshot=true: performs a baseline snapshot before WAL streaming.
  • resnapshot=true: clears CDC state and forces a new snapshot on the next run.
  • If a slot becomes invalidated/lost, Extract forces a resnapshot for recovery.

Table Selection Rules

Table filters are applied in this order:

  1. include_schemas (if provided)
  2. exclude_schemas
  3. include_tables (if provided, supports schema.table or bare table)
  4. exclude_tables (supports schema.table or bare table)

Use fully qualified names (schema.table) when possible to avoid ambiguity.

Records and CDC Metadata

For change events, the connector emits table records and includes CDC metadata fields:

  • _extract_commit_timestamp: commit timestamp for the change (when available from the replication stream)
  • _extract_cdc_lsn: WAL LSN associated with the change
  • _extract_deleted: true for delete events, otherwise false/unset

Truncate operations are also captured. Depending on your destination and modeling, truncates may be represented as control events rather than row-level records.

Validation and Monitoring Queries

Confirm publication members

SELECT schemaname, tablename
FROM pg_publication_tables
WHERE pubname = 'extract_cdc_pub'
ORDER BY schemaname, tablename;

Check slot health and WAL retention

SELECT
slot_name,
active,
restart_lsn,
confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_name = 'extract_cdc_slot';

Troubleshooting

  • wal_level must be 'logical' for CDC

    • Set wal_level=logical and restart Postgres.
  • Replication slot '<slot>' does not exist and create_replication_slot=false

    • Create the slot manually or enable slot creation.
  • Publication '<pub>' does not exist and create_publication=false

    • Create the publication manually or enable publication creation.
  • read_replica=true requires create_publication=false and create_replication_slot=false

    • Disable both creation flags in read-replica mode.
  • read_replica requires Postgres 16+ for standby logical decoding

    • Use a primary host or upgrade replica version/capabilities.
  • CDC requires REPLICA IDENTITY FULL for update safety

    • Set replica_identity_override=FULL or apply ALTER TABLE ... REPLICA IDENTITY FULL manually.

Best Practices

  • Use a dedicated replication slot per Extract connection.
  • Keep runs frequent to prevent excessive WAL retention.
  • Prefer explicit table lists for production CDC.
  • Avoid temporary slots (slot_temporary=true) unless you explicitly want ephemeral state.