Connecting Amazon Redshift with S3 (Data Loading)
— This COPY command loads data from an S3 bucket into a Redshift table
COPY my_table
FROM ‘s3://my-bucket/data.csv’
IAM_ROLE ‘arn:aws:iam::account-id:role/my-redshift-role’
FORMAT AS CSV
IGNOREHEADER 1;
Table of Contents
ToggleConnecting Amazon Redshift with DynamoDB (Data Import)
— This COPY command loads data from DynamoDB into Redshift
COPY my_table
FROM ‘dynamodb://my_dynamo_table’
IAM_ROLE ‘arn:aws:iam::account-id:role/my-redshift-role’
READRATIO 50;
Connecting Redshift with AWS Lambda (Real-Time Data Streaming)
import boto3
import psycopg2
def lambda_handler(event, context):
# Get the S3 file details
s3_bucket = event[‘Records’][0][‘s3’][‘bucket’][‘name’]
s3_key = event[‘Records’][0][‘s3’][‘object’][‘key’]
# Set up the Redshift connection
conn = psycopg2.connect(
host='redshift-cluster-name',
dbname='dbname',
user='username',
password='password',
port='5439'
)
cur = conn.cursor()
# Redshift COPY command to load data
copy_command = f"""
COPY my_table
FROM 's3://{s3_bucket}/{s3_key}'
IAM_ROLE 'arn:aws:iam::account-id:role/my-redshift-role'
FORMAT AS CSV;
"""
cur.execute(copy_command)
conn.commit()
cur.close()
conn.close();
Connecting Redshift with Amazon Kinesis (Real-Time Streaming)
— This configuration sends data to Redshift via Kinesis Firehose (requires Firehose setup)
— Data in the Firehose stream is automatically loaded into Redshift
Redshift Spectrum (Query Data in S3)
— Create an external schema that points to your S3 data
CREATE EXTERNAL SCHEMA spectrum_schema
FROM DATA CATALOG
DATABASE ‘spectrum_db’
REGION ‘us-east-1’;
— Create an external table that links to your S3 data
CREATE EXTERNAL TABLE spectrum_schema.my_s3_table (
id INT,
name VARCHAR(100),
date TIMESTAMP
)
STORED AS PARQUET
LOCATION ‘s3://my-bucket/data/’;
— Query the external table like a regular Redshift table
SELECT *
FROM spectrum_schema.my_s3_table
WHERE date > ‘2022-01-01’;