Google Drive
High-Level Information: The Google Drive integration allows you to extract files from your Google Drive folder and load them into your destination.
Key Details:
- Files are merged into a single data stream during the extraction process.
- Supported file types include delimited text files such as CSV, TSV, and similar formats.
- All data types within the files are automatically converted into string.
Source Setup Guide
Authentication Options:
You can authenticate with Google Drive using either OAuth or a Service Account:
- OAuth: Authenticate using your Google account credentials. This is the recommended method for personal or team Google Drive folders.
in the Edit Source form to the left and follow the authentication flow (OAuth) in Google's website to grant Extract the required permissions.
- Confirm you can see your email and profile picture, and that the source is Connected.
- Service Account: For automated workflows or when OAuth isn't suitable, you can use our service account. Grant permissions to
singular-etl@singular-etl.iam.gserviceaccount.comto access your Google Drive files.
After the authentication, follow the following steps:
- Paste the URL of the folder from which you want us to extract the files.
- Specify the file pattern that will be used to identify the files for processing.
- Indicate the table name you want to use in your destination.
- Click "Save"
Connection Setup Guide
Once you conneted Google Drive to a destination, you will also need to configure:
- Connection Pull Schedule: Determines how frequently data is extracted from the source.
- Backfill (Days): Specifies how far back we should search for updated files.
- Schema Migration Policy: Controls how Extract will handle schema changes from the source.
Connector Information
File Name Partitioning: Data will be extracted based on file name partitioning, meaning only files modified within the backfill period will be updated.
The Cursor: Tracks the last modified file from the previous run and updates all files that were modified since then.
Additional Fields in the Destination:
- internal_file_timestamp: Indicates when the connection run occurred.
- internal_file_name: Identifies the file the record originated from.
- internal_last_modified: Specifies when the file was last updated in the source.
Files Consistency Inconsistency in the structure of the files might surface problems loading the data
Supported File Formats:
- Json
- XML
- CSV
- Gsheet
We support the following json file extensions: .jsonl, .ndjson
Important: The schema will be based on the first record, so ensure all objects have the same schema.
Json objects delimited by lines:
{"name": "John", "age": 30}
{"name": "Jane", "age": 25}
The above example will be inserted at the destination table as:
| name | age |
|---|---|
| John | 30 |
| Jane | 25 |
XML files have a single root tag that contains all the records.
XML can have a complex schema with nested objects and tags, making it less suitable for tabular data (in contrast to JSONL or CSV files).
Therefore, we treat each record under the root tag as a single column in the destination table (named 'records'), where the XML representation of the record is stored as a string.
Expected Format:
The records should be enclosed in a single root tag ('items' in this case).
We'll extract each record under the root tag.
<items>
<item_record>
<name>John</name>
<age>30</age>
</item_record>
<item_record>
<name>Jane</name>
<age>25</age>
</item_record>
</items>
The above records we'll be inserted at the destination table as:
| records |
|---|
| <item_record><name>John</name><age>30</age></item_record> |
| <item_record><name>Jane</name><age>25</age></item_record> |
The following example is invalid because the records are not enclosed in a single root tag:
<root>
<records>
<record>
<name>John</name>
<age>30</age>
</record>
<record>
<name>Jane</name>
<age>25</age>
</record>
</records>
</root>
Gsheet files which are moved to the Google Drive folder can also be extracted.
They usually don't have any special file extension or requirements and we handle them similarly to CSV files.
CSV files can be extracted. The schema will be automatically based on the CSV headers
Supported Compressions We support the following compression formats:
- Gzip
- BZip2
- Zstd
We automatically detect the compression. Whether you want to have the compression format in the file name or not (e.g. file.csv.bz2 or simply file.csv) is up to you.