Skip to main content

Google Drive

info

High-Level Information: The Google Drive integration allows you to extract files from your Google Drive folder and load them into your destination.

Key Details:

  • Files are merged into a single data stream during the extraction process.
  • Supported file types include delimited text files such as CSV, TSV, and similar formats.
  • All data types within the files are automatically converted into string.

Source Setup Guide

Authentication Options:

You can authenticate with Google Drive using either OAuth or a Service Account:

  • OAuth: Authenticate using your Google account credentials. This is the recommended method for personal or team Google Drive folders.
  1. banner in the Edit Source form to the left and follow the authentication flow (OAuth) in Google's website to grant Extract the required permissions.
  2. Confirm you can see your email and profile picture, and that the source is Connected.
  • Service Account: For automated workflows or when OAuth isn't suitable, you can use our service account. Grant permissions to singular-etl@singular-etl.iam.gserviceaccount.com to access your Google Drive files.

After the authentication, follow the following steps:

  1. Paste the URL of the folder from which you want us to extract the files.
  2. Specify the file pattern that will be used to identify the files for processing.
  3. Indicate the table name you want to use in your destination.
  4. Click "Save"

Connection Setup Guide

Once you conneted Google Drive to a destination, you will also need to configure:

  • Connection Pull Schedule: Determines how frequently data is extracted from the source.
  • Backfill (Days): Specifies how far back we should search for updated files.
  • Schema Migration Policy: Controls how Extract will handle schema changes from the source.

Connector Information

info

File Name Partitioning: Data will be extracted based on file name partitioning, meaning only files modified within the backfill period will be updated.

The Cursor: Tracks the last modified file from the previous run and updates all files that were modified since then.

Additional Fields in the Destination:

  1. internal_file_timestamp: Indicates when the connection run occurred.
  2. internal_file_name: Identifies the file the record originated from.
  3. internal_last_modified: Specifies when the file was last updated in the source.

Files Consistency Inconsistency in the structure of the files might surface problems loading the data

Supported File Formats:

We support the following json file extensions: .jsonl, .ndjson


Important: The schema will be based on the first record, so ensure all objects have the same schema.


Json objects delimited by lines:
{"name": "John", "age": 30}
{"name": "Jane", "age": 25}

The above example will be inserted at the destination table as:

nameage
John30
Jane25

Supported Compressions We support the following compression formats:

  • Gzip
  • BZip2
  • Zstd

We automatically detect the compression. Whether you want to have the compression format in the file name or not (e.g. file.csv.bz2 or simply file.csv) is up to you.