Skip to main content

BigQuery

Overview

Extract connects to BigQuery as a source by running SQL queries you define and streaming the results as datasets. Each dataset is a named SQL query that becomes a stream in Extract. Queries can run in Full Refresh mode or Incremental Changes mode when you provide a timestamp column.

This connector uses two retrieval paths:

  • Storage Read API for fast, columnar reads when the query is a simple SELECT * FROM <table> (with an optional WHERE clause).
  • Jobs + getQueryResults for complex SQL and for views when materialization is not enabled.

Prerequisites

  1. A BigQuery project and datasets you want to query.
  2. A service account with permission to run queries and read data.
  3. (Recommended) The BigQuery Storage API enabled in the project for faster reads.
  4. Network access for Extract to reach BigQuery (standard Google APIs).

Required IAM Roles

Assign these roles to the service account used by Extract:

  • roles/bigquery.dataViewer (read table data)
  • roles/bigquery.jobUser (run queries)
  • roles/bigquery.readSessionUser (Storage Read API)

If you enable view materialization (see below), also grant write access on the temp dataset:

  • roles/bigquery.dataEditor (on the temp dataset only)

Authentication

BigQuery supports two authentication options:

  • Use Singular-managed service account (recommended): Singular provides a service account key for the connector to use. Jobs are executed in (and billed to) the project_id you configure in the connector.
  • Use your own service account: Provide your own Google Cloud service account key JSON. Jobs are executed in (and billed to) the project_id you configure in the connector (not the project embedded in the service account key).

When using your own service account, ensure it has the required BigQuery permissions on the datasets you want to read, and that it can run jobs in the configured project_id.

Setup Guide

Step 1 - Create a service account

Create a service account in the Google Cloud Console and generate a JSON key file.

When using your own service account, the connector will run BigQuery jobs in (and bill to) the Project ID you configure in the connector, regardless of any project_id value inside the JSON key.

Step 1b - Using the Extract managed service account (optional)

If you do not want to provide your own service account, set Use own Service Account to false. In that case, you must grant BigQuery access to Extract’s managed service account:

  1. In GCP, go to IAM & Admin → IAM.
  2. Click Grant Access.
  3. Add the Extract managed service account email shown in the connector UI: singular-etl@singular-etl.iam.gserviceaccount.com
  4. Grant the required roles:
    • roles/bigquery.dataViewer
    • roles/bigquery.jobUser
    • roles/bigquery.readSessionUser
  5. If you use a Temp Dataset for view materialization, also grant:
    • roles/bigquery.dataEditor (on the temp dataset only)

You can do this via the CLI as well:

PROJECT_ID=your-project-id
SA_EMAIL=singular-etl@singular-etl.iam.gserviceaccount.com

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.readSessionUser"

Step 2 - Grant permissions

Grant the roles listed above to the service account at the project or dataset level.

In your GCP project, enable:

  • BigQuery API
  • BigQuery Storage API

Step 4 - Configure the connector in Extract

Fill in the connection fields (see parameters below), then add at least one dataset and run a test sync.

Configuration Parameters

Account Connector

  • Use own Service Account - Set to false to use Extract’s managed service account. Set to true to use a service account you provide.
  • Service Account Key - JSON key for your service account. Required only when Use own Service Account is true.
  • Project ID - The BigQuery project that contains your datasets. This project is also used as the billing/project context for running jobs (including when using your own service account).
  • Temp Dataset (optional) - Dataset used to materialize views and complex results for Storage Read. Recommended to set a dataset-level table expiration.

Datasets

Each dataset defines one stream:

  • Dataset Name - Stream name (must be unique).
  • SQL Query - The SQL query that defines the dataset.
  • Allow view materialization (optional) - If enabled, Extract will materialize view results into the temp dataset to use Storage Read.
  • Primary Key (optional) - Column used for dedupe and diffing.
  • Incremental Timestamp Field (optional) - Column used for incremental syncs.

Extraction Modes

Full Refresh

Extract runs the dataset query as-is. If diffing is enabled and a primary key is defined, Extract orders results by the primary key and computes changes.

Incremental Changes

If you set Incremental Timestamp Field, Extract filters results by:

WITH query AS (<your query>)
SELECT * FROM query WHERE <timestamp_field> > <last_cursor_value>

The cursor updates to the max timestamp seen in each run.

Storage Read vs Query Results

Extract chooses the best approach automatically:

  • Direct Storage Read: Used when the SQL is a simple SELECT * FROM project.dataset.table with an optional WHERE clause. This is the fastest path.
  • View materialization: Views and external tables cannot be read directly by the Storage Read API. If Allow view materialization is enabled and Temp Dataset is set, Extract will materialize the query to a temp table and read it with Storage Read.
  • Query Results: For complex SQL (joins, aggregates, CTEs, subqueries), or when view materialization is disabled, Extract uses the standard query results API.
info

Materialization creates a temporary table in your temp dataset. Set a default expiration on that dataset to keep costs and clutter low.

Data Type Handling

Extract maps BigQuery types to JSON-friendly types, including:

  • BOOL -> boolean
  • STRING -> string
  • BYTES -> hex string
  • INT64 -> integer
  • FLOAT64 -> number
  • NUMERIC, BIGNUMERIC -> string
  • DATE -> string (YYYY-MM-DD)
  • TIME -> string (format HH:MM:SS[.ffffff])
  • DATETIME -> string (RFC 3339, UTC)
  • TIMESTAMP -> string (RFC 3339, UTC)
  • GEOGRAPHY -> string
  • JSON, STRUCT, RECORD -> json
  • ARRAY -> json array

Query Tips

  • Prefer SELECT * FROM project.dataset.table for the fastest reads.
  • If you need a view, enable materialization and set a temp dataset.
  • For incremental syncs, choose a stable timestamp column with monotonic updates.

Troubleshooting

  • No datasets returned: verify the service account has bigquery.dataViewer and that the SQL is valid.
  • Storage Read errors: ensure BigQuery Storage API is enabled and the service account has bigquery.readSessionUser.
  • View errors: enable Allow view materialization and set Temp Dataset, or switch to a query results flow.