Azure Blob Storage

info

Destination Documentation: Azure Blob Storage Documentation

High-Level Information:

Azure Blob Storage is Microsoft's object storage solution for the cloud, optimized for storing massive amounts of unstructured data. Extract integrates with Azure Blob Storage using account key or SAS token authentication, enabling direct data loading to blob containers. The connector supports multiple file formats including JSONL, CSV, and Parquet, with flexible path templating for organized data storage. Data is written using Azure's append blob mechanism with efficient chunk-based uploads. The integration is ideal for data lake architectures, archival storage, and as a staging area for other Azure services.

Prerequisites

An Azure account with an active subscription
An Azure Storage account with Blob Storage enabled
Either an Account Key or SAS token for authentication

Setup Guide

Step 1 - Create an Azure Storage Account (skip if you already have one)

Sign in to the Azure Portal
Click Create a resource and search for Storage account
Configure your storage account:
- Resource group: Select existing or create new
- Storage account name: Must be globally unique, 3-24 characters
- Region: Select a region close to your data sources/consumers
- Performance: Standard (recommended for most workloads)
- Redundancy: Choose based on your durability requirements (LRS, ZRS, GRS, etc.)
In the Advanced tab:
- Ensure Blob storage is enabled
- Configure security settings as per your requirements
Review and create the storage account

Step 2 - Create a Blob Container

Navigate to your Storage account in the Azure Portal
In the left menu, under Data storage, click Containers
Click + Container to create a new container
Configure the container:
- Name: Choose a descriptive name (e.g., extract-data)
- Public access level: Set to Private (no anonymous access) for security
Click Create

Step 3 - Choose Your Authentication Method

Extract supports both authentication methods for Azure Blob Storage:

Option A: Account Key

Full access to the entire storage account
Simpler setup process
Permanent access (until key is regenerated)

Option B: SAS Token (Recommended)

More secure with fine-grained access control
Time-limited access with expiry dates
Specific permissions per service and resource type
Recommended for production environments

Step 4A - Get Your Account Key (if using Account Key authentication)

In your Storage account, navigate to Access keys under Security + networking
Click Show keys to reveal the account keys
Copy either key1 or key2 (both work identically)
Construct your connection string:

DefaultEndpointsProtocol=https;AccountName=YOUR_STORAGE_ACCOUNT_NAME;AccountKey=YOUR_ACCOUNT_KEY;EndpointSuffix=core.windows.net

Step 4B - Generate a SAS Token (if using SAS authentication)

In your Storage account, navigate to Shared access signature under Security + networking
Configure the SAS parameters:
- Allowed services: Check all checkboxes (Blob, File, Queue, Table)
- Allowed resource types: Check all checkboxes (Service, Container, Object) - these are not checked by default
- Allowed permissions: Check all checkboxes (Read, Write, Delete, List, Add, Create, Update, Process, Immutable storage, Permanent delete, Filter)
- Start and expiry date/time: Set the expiry date to a few years in the future for long-term access
- Allowed IP addresses: (Optional) Add Extract's IPs if restricting access
Click Generate SAS and connection string
Copy the Connection string that includes the SAS token

Step 5 - Configure Network Access (if required)

If your storage account has network restrictions:

Navigate to Networking under Security + networking
Under Firewall and virtual networks, add Extract's server IPs to the allowed list:
- 3.134.124.160
- 3.150.64.207
- 44.232.26.19
- 54.214.149.234
Alternatively, configure private endpoints if using Azure Private Link

Step 6 - Configure the Connector in Extract

Navigate to the Destinations tab in Extract and add a new Azure Blob Storage destination with the following parameters:

Connection String - The full connection string from Step 4A or 4B
Container Name - The name of the blob container you created in Step 2
File Format - Choose between JSONL, CSV, or Parquet
Blob Path Template - Path template for organizing blobs (see Path Templating section below)

Hit Save and verify the connection is successful.

Configuration Parameters

Connection String - Your Azure Storage connection string containing authentication credentials. This can include either an Account Key or SAS token.
Container Name - The Azure Blob Storage container where data will be written. This container must exist before data loading begins.
File Format - The format for data files:
- JSONL - Newline-delimited JSON, ideal for semi-structured data
- CSV - Comma-separated values, best for tabular data and Excel compatibility
- Parquet - Columnar format, optimal for analytics workloads and compression
Compress - Optional. When set to true, output files are gzip-compressed and the destination appends .gz to the generated filename (for example, *.jsonl.gz). Defaults to false.
Blob Path Template - Template for organizing blobs within your container. See Path Templating section for available variables.

Path Templating

The blob path template allows you to organize your data using dynamic variables. This helps create a logical folder structure within your container.

Available Template Variables

Stream Information:

{stream_name} - Name of the source stream
{connection_id} - Unique identifier for the connection
{connection_name} - Human-readable connection name

Time-based Variables:

{timestamp} - Unix timestamp of the sync
{date_time} - ISO 8601 formatted date-time (e.g., 2024-01-15T14:30:00Z)
{year} - 4-digit year
{month} - 2-digit month (01-12)
{day} - 2-digit day (01-31)
{hour} - 2-digit hour (00-23)
{minute} - 2-digit minute (00-59)

Cursor Variables (for incremental syncs):

{incremental_key} - The incremental key value
{cursor_value} - Current cursor position
{cursor_datetime} - Cursor value as datetime
{cursor_date} - Date portion of cursor
{cursor_year} - Year from cursor
{cursor_month} - Month from cursor
{cursor_day} - Day from cursor

Path Template Examples

Daily partitioning by stream:

{stream_name}/{year}/{month}/{day}/data_{timestamp}

Result: customers/2024/01/15/data_1705330200

Hourly partitioning with connection info:

{connection_name}/{stream_name}/{year}-{month}-{day}/{hour}/extract_{incremental_key}

Result: production_sync/orders/2024-01-15/14/extract_1000

Simple stream-based organization:

raw/{stream_name}/{date_time}

Result: raw/products/2024-01-15T14:30:00Z

Notes

The destination can optionally gzip-compress output files. Enable this by setting the compress configuration parameter to true. When enabled, the connector writes the same file format content (for example, JSONL or CSV) and appends a .gz suffix to the uploaded blob name.

Prerequisites​

Setup Guide​

Step 1 - Create an Azure Storage Account (skip if you already have one)​

Step 2 - Create a Blob Container​

Step 3 - Choose Your Authentication Method​

Step 4A - Get Your Account Key (if using Account Key authentication)​

Step 4B - Generate a SAS Token (if using SAS authentication)​

Step 5 - Configure Network Access (if required)​

Step 6 - Configure the Connector in Extract​

Configuration Parameters​

Path Templating​

Available Template Variables​

Path Template Examples​

Notes​