View a markdown version of this page

Getting started with Amazon Redshift provisioned clusters - Amazon Redshift

Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the blog post .

Getting started with Amazon Redshift provisioned clusters

The following code example shows how to:

  • Create a Redshift cluster

  • Create an IAM role for S3 access

  • Create tables and load data

  • Run example queries

  • Clean up resources

Bash
AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon Redshift Provisioned Cluster Tutorial Script # This script creates a Redshift cluster, loads sample data, runs queries, and cleans up resources # Version 3: Fixed IAM role usage in COPY commands # Set up logging LOG_FILE="redshift_tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon Redshift tutorial script at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created so far:" if [ -n "$CLUSTER_ID" ]; then echo "- Redshift Cluster: $CLUSTER_ID"; fi if [ -n "$ROLE_NAME" ]; then echo "- IAM Role: $ROLE_NAME"; fi echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to clean up resources cleanup_resources() { echo "Cleaning up resources..." # Delete the cluster if it exists if [ -n "$CLUSTER_ID" ]; then echo "Deleting Redshift cluster: $CLUSTER_ID" aws redshift delete-cluster --cluster-identifier "$CLUSTER_ID" --skip-final-cluster-snapshot echo "Waiting for cluster deletion to complete..." aws redshift wait cluster-deleted --cluster-identifier "$CLUSTER_ID" echo "Cluster deleted successfully." fi # Delete the IAM role if it exists if [ -n "$ROLE_NAME" ]; then echo "Removing IAM role policy..." aws iam delete-role-policy --role-name "$ROLE_NAME" --policy-name RedshiftS3Access || echo "Failed to delete role policy" echo "Deleting IAM role: $ROLE_NAME" aws iam delete-role --role-name "$ROLE_NAME" || echo "Failed to delete role" fi echo "Cleanup completed." } # Function to wait for SQL statement to complete wait_for_statement() { local statement_id=$1 local max_attempts=30 local attempt=1 local status="" echo "Waiting for statement $statement_id to complete..." while [ $attempt -le $max_attempts ]; do status=$(aws redshift-data describe-statement --id "$statement_id" --query 'Status' --output text) if [ "$status" == "FINISHED" ]; then echo "Statement completed successfully." return 0 elif [ "$status" == "FAILED" ]; then local error=$(aws redshift-data describe-statement --id "$statement_id" --query 'Error' --output text) echo "Statement failed with error: $error" return 1 elif [ "$status" == "ABORTED" ]; then echo "Statement was aborted." return 1 fi echo "Statement status: $status. Waiting... (Attempt $attempt/$max_attempts)" sleep 10 ((attempt++)) done echo "Timed out waiting for statement to complete." return 1 } # Function to check if IAM role is attached to cluster check_role_attached() { local role_arn=$1 local max_attempts=10 local attempt=1 echo "Checking if IAM role is attached to the cluster..." while [ $attempt -le $max_attempts ]; do local status=$(aws redshift describe-clusters \ --cluster-identifier "$CLUSTER_ID" \ --query "Clusters[0].IamRoles[?IamRoleArn=='$role_arn'].ApplyStatus" \ --output text) if [ "$status" == "in-sync" ]; then echo "IAM role is successfully attached to the cluster." return 0 fi echo "IAM role status: $status. Waiting... (Attempt $attempt/$max_attempts)" sleep 30 ((attempt++)) done echo "Timed out waiting for IAM role to be attached." return 1 } # Variables to track created resources CLUSTER_ID="examplecluster" ROLE_NAME="RedshiftS3Role-$(date +%s)" DB_NAME="dev" DB_USER="awsuser" DB_PASSWORD="Changeit1" # In production, use AWS Secrets Manager to generate and store passwords echo "=== Step 1: Creating Amazon Redshift Cluster ===" # Create the Redshift cluster echo "Creating Redshift cluster: $CLUSTER_ID" CLUSTER_RESULT=$(aws redshift create-cluster \ --cluster-identifier "$CLUSTER_ID" \ --node-type ra3.4xlarge \ --number-of-nodes 2 \ --master-username "$DB_USER" \ --master-user-password "$DB_PASSWORD" \ --db-name "$DB_NAME" \ --port 5439 2>&1) # Check for errors if echo "$CLUSTER_RESULT" | grep -i "error"; then handle_error "Failed to create Redshift cluster: $CLUSTER_RESULT" fi echo "$CLUSTER_RESULT" echo "Waiting for cluster to become available..." # Wait for the cluster to be available aws redshift wait cluster-available --cluster-identifier "$CLUSTER_ID" || handle_error "Timeout waiting for cluster to become available" # Get cluster status to confirm CLUSTER_STATUS=$(aws redshift describe-clusters \ --cluster-identifier "$CLUSTER_ID" \ --query 'Clusters[0].ClusterStatus' \ --output text) echo "Cluster status: $CLUSTER_STATUS" echo "=== Step 2: Creating IAM Role for S3 Access ===" # Create trust policy file echo "Creating trust policy for Redshift" cat > redshift-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role echo "Creating IAM role: $ROLE_NAME" ROLE_RESULT=$(aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document file://redshift-trust-policy.json 2>&1) # Check for errors if echo "$ROLE_RESULT" | grep -i "error"; then handle_error "Failed to create IAM role: $ROLE_RESULT" fi echo "$ROLE_RESULT" # Get the role ARN ROLE_ARN=$(aws iam get-role --role-name "$ROLE_NAME" --query 'Role.Arn' --output text) echo "Role ARN: $ROLE_ARN" # Create policy document for S3 access echo "Creating S3 access policy" cat > redshift-s3-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::redshift-downloads", "arn:aws:s3:::redshift-downloads/*" ] } ] } EOF # Attach policy to role echo "Attaching S3 access policy to role" POLICY_RESULT=$(aws iam put-role-policy \ --role-name "$ROLE_NAME" \ --policy-name RedshiftS3Access \ --policy-document file://redshift-s3-policy.json 2>&1) # Check for errors if echo "$POLICY_RESULT" | grep -i "error"; then handle_error "Failed to attach policy to role: $POLICY_RESULT" fi echo "$POLICY_RESULT" # Attach role to cluster echo "Attaching IAM role to Redshift cluster" ATTACH_ROLE_RESULT=$(aws redshift modify-cluster-iam-roles \ --cluster-identifier "$CLUSTER_ID" \ --add-iam-roles "$ROLE_ARN" 2>&1) # Check for errors if echo "$ATTACH_ROLE_RESULT" | grep -i "error"; then handle_error "Failed to attach role to cluster: $ATTACH_ROLE_RESULT" fi echo "$ATTACH_ROLE_RESULT" # Wait for the role to be attached echo "Waiting for IAM role to be attached to the cluster..." if ! check_role_attached "$ROLE_ARN"; then handle_error "Failed to attach IAM role to cluster" fi echo "=== Step 3: Getting Cluster Connection Information ===" # Get cluster endpoint CLUSTER_INFO=$(aws redshift describe-clusters \ --cluster-identifier "$CLUSTER_ID" \ --query 'Clusters[0].Endpoint.{Address:Address,Port:Port}' \ --output json) echo "Cluster endpoint information:" echo "$CLUSTER_INFO" echo "=== Step 4: Creating Tables and Loading Data ===" echo "Creating sales table" SALES_TABLE_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "DROP TABLE IF EXISTS sales; CREATE TABLE sales(salesid integer not null, listid integer not null distkey, sellerid integer not null, buyerid integer not null, eventid integer not null, dateid smallint not null sortkey, qtysold smallint not null, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp);" \ --query 'Id' --output text) echo "Sales table creation statement ID: $SALES_TABLE_ID" # Wait for statement to complete if ! wait_for_statement "$SALES_TABLE_ID"; then handle_error "Failed to create sales table" fi echo "Creating date table" DATE_TABLE_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "DROP TABLE IF EXISTS date; CREATE TABLE date(dateid smallint not null distkey sortkey, caldate date not null, day character(3) not null, week smallint not null, month character(5) not null, qtr character(5) not null, year smallint not null, holiday boolean default('N'));" \ --query 'Id' --output text) echo "Date table creation statement ID: $DATE_TABLE_ID" # Wait for statement to complete if ! wait_for_statement "$DATE_TABLE_ID"; then handle_error "Failed to create date table" fi echo "Loading data into sales table" SALES_LOAD_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "COPY sales FROM 's3://redshift-downloads/tickit/sales_tab.txt' DELIMITER '\t' TIMEFORMAT 'MM/DD/YYYY HH:MI:SS' REGION 'us-east-1' IAM_ROLE '$ROLE_ARN';" \ --query 'Id' --output text) echo "Sales data load statement ID: $SALES_LOAD_ID" # Wait for statement to complete if ! wait_for_statement "$SALES_LOAD_ID"; then handle_error "Failed to load data into sales table" fi echo "Loading data into date table" DATE_LOAD_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "COPY date FROM 's3://redshift-downloads/tickit/date2008_pipe.txt' DELIMITER '|' REGION 'us-east-1' IAM_ROLE '$ROLE_ARN';" \ --query 'Id' --output text) echo "Date data load statement ID: $DATE_LOAD_ID" # Wait for statement to complete if ! wait_for_statement "$DATE_LOAD_ID"; then handle_error "Failed to load data into date table" fi echo "=== Step 5: Running Example Queries ===" echo "Running query: Get definition for the sales table" QUERY1_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "SELECT * FROM pg_table_def WHERE tablename = 'sales';" \ --query 'Id' --output text) echo "Query 1 statement ID: $QUERY1_ID" # Wait for statement to complete if ! wait_for_statement "$QUERY1_ID"; then handle_error "Query 1 failed" fi # Get and display results echo "Query 1 results (first 10 rows):" aws redshift-data get-statement-result --id "$QUERY1_ID" --max-items 10 echo "Running query: Find total sales on a given calendar date" QUERY2_ID=$(aws redshift-data execute-statement \ --cluster-identifier "$CLUSTER_ID" \ --database "$DB_NAME" \ --db-user "$DB_USER" \ --sql "SELECT sum(qtysold) FROM sales, date WHERE sales.dateid = date.dateid AND caldate = '2008-01-05';" \ --query 'Id' --output text) echo "Query 2 statement ID: $QUERY2_ID" # Wait for statement to complete if ! wait_for_statement "$QUERY2_ID"; then handle_error "Query 2 failed" fi # Get and display results echo "Query 2 results:" aws redshift-data get-statement-result --id "$QUERY2_ID" echo "=== Tutorial Complete ===" echo "The following resources were created:" echo "- Redshift Cluster: $CLUSTER_ID" echo "- IAM Role: $ROLE_NAME" echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy] ]]; then cleanup_resources echo "All resources have been cleaned up." else echo "Resources were not cleaned up. You can manually delete them later." echo "To avoid incurring charges, remember to delete the following resources:" echo "- Redshift Cluster: $CLUSTER_ID" echo "- IAM Role: $ROLE_NAME" fi echo "Script completed at $(date)"

For a complete list of AWS SDK developer guides and code examples, see Using this service with an AWS SDK. This topic also includes information about getting started and details about previous SDK versions.