Skip to main content

Sharepoint OneDrive

info

High-Level Information: The Sharepoint OneDrive integration allows you to extract files from your Sharepoint folder and load them into your destination.

Key Details:

  • You can sync data from 1 Sharepoint site per connection.
  • Files are merged into a single data stream during the extraction process.
  • Supported file types include delimited text files such as CSV, TSV, and similar formats.
  • All data types within the files are automatically converted into string.

Source Setup Guide

  1. banner in the Edit Source form to the left and follow the authentication flow (OAuth) in Sharepoint's website to grant Extract the required permissions.
  2. Confirm you can see your email and profile picture, and that the source is Connected.
  3. Paste the URL of the OneDrive folder from which you want us to extract the files.
  4. Specify the file pattern that will be used to identify the files for processing.
  5. Indicate the table name you want to use in your destination.
  6. Click "Save"

Connection Setup Guide

Once you conneted Google Drive to a destination, you will also need to configure:

  • Connection Pull Schedule: Determines how frequently data is extracted from the source.
  • Backfill (Days): Specifies how far back we should search for updated files.
  • Schema Migration Policy: Controls how Extract will handle schema changes from the source.

Connector Information

info

File Name Partitioning: Data will be extracted based on file name partitioning, meaning only files modified within the backfill period will be updated.

The Cursor: Tracks the last modified file from the previous run and updates all files that were modified since then.

Additional Fields in the Destination:

  1. internal_file_timestamp: Indicates when the connection run occurred.
  2. internal_file_name: Identifies the file the record originated from.
  3. internal_last_modified: Specifies when the file was last updated in the source.

Files Consistency Inconsistency in the structure of the files might surface problems loading the data

Supported File Formats:

We support the following json file extensions: .json, .jsonl, .ndjson


Important: The schema will be based on the first record, so ensure all objects have the same schema.


Json objects delimited by lines (any format is acceptable):
{"name": "John", "age": 30}
{"name": "Jane", "age": 25}

or

{
"name": "John",
"age": 30
}
{
"name": "Jane",
"age": 25
}

Json array (any format is acceptable):
[
{"name": "John", "age": 30},
{"name": "Jane", "age": 25}
]

Or

[
{
"name": "John",
"age": 30
},
{
"name": "Jane",
"age": 25
}
]

or

[{"name": "John","age": 30},{"name": "Jane","age": 25}]

All of the above examples will be inserted at the destination table as:

nameage
John30
Jane25

Important: Regardless of the format, we always extract the objects at the top level. Therefore, the objects corresponding to the records should be the top-level objects in the file (either under an array or with no parent object). For example, the following file will not be processed correctly, as we will assume the schema has a single field named `records`:

{
"records": [
{"name": "John", "age": 30},
{"name": "Jane", "age": 25}
]
}

This will be inserted at the destination table as:

records
[{"name": "John", "age": 30}, {"name": "Jane", "age": 25}]

Supported Compressions We support the following compression formats:

  • Gzip
  • BZip2
  • Zstd

We automatically detect the compression. Whether you want to have the compression format in the file name or not (e.g. file.csv.bz2 or simply file.csv) is up to you.