BigQuery
Overview
Extract connects to BigQuery as a source by running SQL queries you define and streaming the results as datasets. Each dataset is a named SQL query that becomes a stream in Extract. Queries can run in Full Refresh mode or Incremental Changes mode when you provide a timestamp column.
This connector uses two retrieval paths:
- Storage Read API for fast, columnar reads when the query is a simple
SELECT * FROM <table>(with an optionalWHEREclause). - Jobs + getQueryResults for complex SQL and for views when materialization is not enabled.
Prerequisites
- A BigQuery project and datasets you want to query.
- A service account with permission to run queries and read data.
- (Recommended) The BigQuery Storage API enabled in the project for faster reads.
- Network access for Extract to reach BigQuery (standard Google APIs).
Required IAM Roles
Assign these roles to the service account used by Extract:
roles/bigquery.dataViewer(read table data)roles/bigquery.jobUser(run queries)roles/bigquery.readSessionUser(Storage Read API)
If you enable view materialization (see below), also grant write access on the temp dataset:
roles/bigquery.dataEditor(on the temp dataset only)
Authentication
BigQuery supports two authentication options:
- Use Singular-managed service account (recommended): Singular provides a service account key for the connector to use. Jobs are executed in (and billed to) the
project_idyou configure in the connector. - Use your own service account: Provide your own Google Cloud service account key JSON. Jobs are executed in (and billed to) the
project_idyou configure in the connector (not the project embedded in the service account key).
When using your own service account, ensure it has the required BigQuery permissions on the datasets you want to read, and that it can run jobs in the configured project_id.
Setup Guide
Step 1 - Create a service account
Create a service account in the Google Cloud Console and generate a JSON key file.
When using your own service account, the connector will run BigQuery jobs in (and bill to) the Project ID you configure in the connector, regardless of any project_id value inside the JSON key.
Step 1b - Using the Extract managed service account (optional)
If you do not want to provide your own service account, set Use own Service Account to false.
In that case, you must grant BigQuery access to Extract’s managed service account:
- In GCP, go to IAM & Admin → IAM.
- Click Grant Access.
- Add the Extract managed service account email shown in the connector UI:
singular-etl@singular-etl.iam.gserviceaccount.com - Grant the required roles:
roles/bigquery.dataViewerroles/bigquery.jobUserroles/bigquery.readSessionUser
- If you use a Temp Dataset for view materialization, also grant:
roles/bigquery.dataEditor(on the temp dataset only)
You can do this via the CLI as well:
PROJECT_ID=your-project-id
SA_EMAIL=singular-etl@singular-etl.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.dataViewer"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.jobUser"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.readSessionUser"
Step 2 - Grant permissions
Grant the roles listed above to the service account at the project or dataset level.
Step 3 - Enable the Storage API (recommended)
In your GCP project, enable:
- BigQuery API
- BigQuery Storage API
Step 4 - Configure the connector in Extract
Fill in the connection fields (see parameters below), then add at least one dataset and run a test sync.
Configuration Parameters
Account Connector
- Use own Service Account - Set to
falseto use Extract’s managed service account. Set totrueto use a service account you provide. - Service Account Key - JSON key for your service account. Required only when Use own Service Account is
true. - Project ID - The BigQuery project that contains your datasets. This project is also used as the billing/project context for running jobs (including when using your own service account).
- Temp Dataset (optional) - Dataset used to materialize views and complex results for Storage Read. Recommended to set a dataset-level table expiration.
Datasets
Each dataset defines one stream:
- Dataset Name - Stream name (must be unique).
- SQL Query - The SQL query that defines the dataset.
- Allow view materialization (optional) - If enabled, Extract will materialize view results into the temp dataset to use Storage Read.
- Primary Key (optional) - Column used for dedupe and diffing.
- Incremental Timestamp Field (optional) - Column used for incremental syncs.
Extraction Modes
Full Refresh
Extract runs the dataset query as-is. If diffing is enabled and a primary key is defined, Extract orders results by the primary key and computes changes.
Incremental Changes
If you set Incremental Timestamp Field, Extract filters results by:
WITH query AS (<your query>)
SELECT * FROM query WHERE <timestamp_field> > <last_cursor_value>
The cursor updates to the max timestamp seen in each run.
Storage Read vs Query Results
Extract chooses the best approach automatically:
- Direct Storage Read: Used when the SQL is a simple
SELECT * FROM project.dataset.tablewith an optionalWHEREclause. This is the fastest path. - View materialization: Views and external tables cannot be read directly by the Storage Read API. If Allow view materialization is enabled and Temp Dataset is set, Extract will materialize the query to a temp table and read it with Storage Read.
- Query Results: For complex SQL (joins, aggregates, CTEs, subqueries), or when view materialization is disabled, Extract uses the standard query results API.
Materialization creates a temporary table in your temp dataset. Set a default expiration on that dataset to keep costs and clutter low.
Data Type Handling
Extract maps BigQuery types to JSON-friendly types, including:
BOOL-> booleanSTRING-> stringBYTES-> hex stringINT64-> integerFLOAT64-> numberNUMERIC,BIGNUMERIC-> stringDATE-> string (YYYY-MM-DD)TIME-> string (formatHH:MM:SS[.ffffff])DATETIME-> string (RFC 3339, UTC)TIMESTAMP-> string (RFC 3339, UTC)GEOGRAPHY-> stringJSON,STRUCT,RECORD-> jsonARRAY-> json array
Query Tips
- Prefer
SELECT * FROM project.dataset.tablefor the fastest reads. - If you need a view, enable materialization and set a temp dataset.
- For incremental syncs, choose a stable timestamp column with monotonic updates.
Troubleshooting
- No datasets returned: verify the service account has
bigquery.dataViewerand that the SQL is valid. - Storage Read errors: ensure BigQuery Storage API is enabled and the service account has
bigquery.readSessionUser. - View errors: enable Allow view materialization and set Temp Dataset, or switch to a query results flow.