Skip to main content

Amazon S3

Prerequisites

  1. Admin access to your AWS Console to create/configure an S3 bucket and permissions

Setup Guide

Create an S3 bucket (skip if you already have one you want to use)

  1. Login to your AWS Console and switch to the desired region. We recommend choosing the same region you have the rest of your compute/storage/databases (Redshift, Postgres, EC2, EKS, etc).

  2. Go to the S3 home page, and click "Create bucket": redshift clusters list

  3. Select a name for your bucket and keep ACLs disabled redshift clusters list

  4. You can use the default values for the rest of the settings, or change them to your liking.

Create a policy that grants access to the bucket

  1. In your AWS Console, go to IAM. In the menu on the left, click Policies.

  2. Click Create Policy and click the JSON tab. policy json tab

  3. Copy the following policy (replace YOUR_S3_BUCKET_NAME with the real bucket name):

{
"Version": "2012-10-17",
"Id": "",
"Statement": [
{
"Sid": "ExtractBucketAccess",
"Effect": "Allow",
"Action":
[
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:DeleteObject"
],
"Resource":
[
"arn:aws:s3:::YOUR_S3_BUCKET_NAME",
"arn:aws:s3:::YOUR_S3_BUCKET_NAME/*"
]
}
]
}

policy json

  1. Click Next, set a Policy name and click Create policy policy creation

Step 3 - Choose an authentication mode

There are two methods of authentication:

  1. The "Cross Account Role" (Step 4) method is prefered and recommended by AWS. It works by us "assuming" a Role you create in your AWS account.
  2. The "Access Key" (Step 5) method works by creating an IAM User with access to the AWS resource, and granting us the Access Key ID and Secret Access Key parameters.

Step 4 - Cross Account Role (Auth Method 1 - Preferred)

  1. In the setup tab of your connector, select the "Cross Accont Role" method, and copy the External ID value

  2. In your AWS Console, go to IAM. In the menu on the left, click Roles.

  3. Click Create role

create_role

  1. Choose AWS account

  2. Select the "Another AWS account" option, and type in this Account ID: 623184213151

  3. In Options - enable Require external ID and paste the External ID provided to you in the Setup tab of the connector

create_role_screen

  1. Find the permission policy you created in Step 2 and select it

attach_policy_to_role

  1. Give the role a meaningful name and click Create role. For reference your Trust policy should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"AWS": "623184213151"
},
"Condition": {
"StringEquals": {
"sts:ExternalId": "YOUR-EXTERNAL-ID-HERE"
}
}
}
]
}

Step 5 - Access Key (Auth Method 2)

  1. In the setup tab of your connector, select the "Access Key" method

  2. In your AWS Console, go to IAM. In the menu on the left, click Users.

  3. Click Create user, choose a User name and click Next

  4. Click Attach policies directly and select the opilcy you just created. attach_policies_directly

  5. Click Next, review all the details and then click Create

  6. Find the user you just created, click on it, and go to the Security credentials tab.

  7. Click on Create access key. In your use-case you can choose "Third-party service", check the checkbox below and click Next.

  8. Select a description, and click Create access key

  9. Copy your Access Key and Secret Access Key

Step 6 - complete the connector setup

  1. Select your Authentication Method based on your choice in Step 3
  2. Fill in the parameters based on the values from Step 4 or Step 5
  3. Enter your S3 Bucket Name based on Step 1
  4. Fill in the rest of the parameters based on the descriptions in the setup page

Connection Settings

When S3 is utilized as a destination, you need to define the S3 Bucket Key which is essentially the path of the files written in the bucket.

Since the source writes different streams, and each stream has updates over time, we'd like to be able to write to different keys each time.

To facilitate that, the S3 Bucket Key parameter supports Macros that expand automatically by our code:

macrodescriptionsample value
{extension}The extension for the file ("csv", "csv.gz", "jsonl", "jsonl.gz", "parquet")csv
{date_time}The timestamp of when the actual load is taking place. Format is %Y_%m_%d_%H_%M_%S2025_04_14_11_23_45
{timestamp}Same as date_time2025_04_14_11_23_45
{iso_date_time}The timestamp of when the actual load is taking place. Format is ISO8601`2025-04-14T11:23:45Z
{date}The date of when the actual load is taking place. Format is %Y_%m_%d2025_04_14
{iso_date}The date of when the actual load is taking place. Format is %Y_%m_%d2025-04-14
{year}The year of when the actual load is taking place (zero-padded to 4 digits)2025
{month}The month of when the actual load is taking place (01-12)04
{day}The day of when the actual load is taking place (01-31)14
{hour}The hour of when the actual load is taking place (00-23)11
{minute}The minute of when the actual load is taking place (00-59)23
{second}The second of when the actual load is taking place (00-59)`45
{stream_id}The id of the streamdeadbeef-2359-567a-f5e8-83c536cb4de8
{stream_name}The name of the streamsales_summary_daily
{connection_id}The id of the connection (source->dest)f52c7268-2359-567a-f5e8-83c536cb4de8
{connection_run_id}The id that represents the run of the connection (across all streams)fad1983d-7446-e5c4-bb46-4578affe02d5
{stream_run_id}The id that represents the run of a specific stream within the run of the entire connection8e28c4bf-11aa-20aa-0ecf-d23c21c73285
{cursor_year}The cursor is used by the source to determine what data to pull. This is the year portion of the cursor2025
{cursor_month}The cursor is used by the source to determine what data to pull. This is the month portion of the cursor (01-12)04
{cursor_day}The cursor is used by the source to determine what data to pull. This is the year portion of the cursor (01-31)14
{cursor_date_time}The cursor is used by the source to determine what data to pull. This is the cursor value formatted as %Y_%m_%d_%H_%M_%S2025_04_14_11_23_45
{cursor_iso_date_time}The cursor is used by the source to determine what data to pull. This is the cursor value formatted as ISO8601`2025-04-14T11:23:45Z
{cursor_singular_date_time}The cursor is used by the source to determine what data to pull. This is the cursor value formatted as %Y-%m-%d-%H-%M-%S2025-04-14-11-23-45