View a markdown version of this page

AWS STS examples using AWS CLI with Bash script - AWS Command Line Interface

AWS STS examples using AWS CLI with Bash script

The following code examples show you how to perform actions and implement common scenarios by using the AWS Command Line Interface with Bash script with AWS STS.

Actions are code excerpts from larger programs and must be run in context. While actions show you how to call individual service functions, you can see actions in context in their related scenarios.

Scenarios are code examples that show you how to accomplish specific tasks by calling multiple functions within a service or combined with other AWS services.

Each example includes a link to the complete source code, where you can find instructions on how to set up and run the code in context.

Actions

The following code example shows how to use AssumeRole.

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

############################################################################### # function iecho # # This function enables the script to display the specified text only if # the global variable $VERBOSE is set to true. ############################################################################### function iecho() { if [[ $VERBOSE == true ]]; then echo "$@" fi } ############################################################################### # function errecho # # This function outputs everything sent to it to STDERR (standard error output). ############################################################################### function errecho() { printf "%s\n" "$*" 1>&2 } ############################################################################### # function sts_assume_role # # This function assumes a role in the AWS account and returns the temporary # credentials. # # Parameters: # -n role_session_name -- The name of the session. # -r role_arn -- The ARN of the role to assume. # # Returns: # [access_key_id, secret_access_key, session_token] # And: # 0 - If successful. # 1 - If an error occurred. ############################################################################### function sts_assume_role() { local role_session_name role_arn response local option OPTARG # Required to use getopts command in a function. # bashsupport disable=BP5008 function usage() { echo "function sts_assume_role" echo "Assumes a role in the AWS account and returns the temporary credentials:" echo " -n role_session_name -- The name of the session." echo " -r role_arn -- The ARN of the role to assume." echo "" } while getopts n:r:h option; do case "${option}" in n) role_session_name=${OPTARG} ;; r) role_arn=${OPTARG} ;; h) usage return 0 ;; \?) echo "Invalid parameter" usage return 1 ;; esac done response=$(aws sts assume-role \ --role-session-name "$role_session_name" \ --role-arn "$role_arn" \ --output text \ --query "Credentials.[AccessKeyId, SecretAccessKey, SessionToken]") local error_code=${?} if [[ $error_code -ne 0 ]]; then aws_cli_error_log $error_code errecho "ERROR: AWS reports create-role operation failed.\n$response" return 1 fi echo "$response" return 0 }
  • For API details, see AssumeRole in AWS CLI Command Reference.

Scenarios

The following code example shows how to:

  • Create the VPC infrastructure

  • Set up logging

  • Create the ECS cluster

  • Configure IAM roles

  • Create the service with Service Connect

  • Verify the deployment

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # ECS Service Connect Tutorial Script v4 - Modified to use Default VPC # This script creates an ECS cluster with Service Connect and deploys an nginx service # Uses the default VPC to avoid VPC limits set -e # Exit on any error # Configuration SCRIPT_NAME="ECS Service Connect Tutorial" LOG_FILE="ecs-service-connect-tutorial-v4-default-vpc.log" REGION=${AWS_DEFAULT_REGION:-${AWS_REGION:-$(aws configure get region 2>/dev/null)}} if [ -z "$REGION" ]; then echo "ERROR: No AWS region configured." echo "Set one with: aws configure set region us-east-1" exit 1 fi ENV_PREFIX="tutorial" CLUSTER_NAME="${ENV_PREFIX}-cluster" NAMESPACE_NAME="service-connect" # Generate random suffix for unique resource names RANDOM_SUFFIX=$(openssl rand -hex 6) # Arrays to track created resources for cleanup declare -a CREATED_RESOURCES=() # Logging function log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE" } # Error handling function handle_error() { log "ERROR: Script failed at line $1" log "Attempting to clean up resources..." cleanup_resources exit 1 } # Set up error handling trap 'handle_error $LINENO' ERR # Function to add resource to tracking array track_resource() { CREATED_RESOURCES+=("$1") log "Tracking resource: $1" } # Function to check if command output contains actual errors check_for_errors() { local output="$1" local command_name="$2" # Check for specific AWS CLI error patterns, not just any occurrence of "error" if echo "$output" | grep -qi "An error occurred\|InvalidParameterException\|AccessDenied\|ValidationException\|ResourceNotFoundException"; then log "ERROR in $command_name: $output" return 1 fi return 0 } # Function to get AWS account ID get_account_id() { ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) log "Using AWS Account ID: $ACCOUNT_ID" } # Function to wait for resources to be ready wait_for_resource() { local resource_type="$1" local resource_id="$2" case "$resource_type" in "cluster") log "Waiting for cluster $resource_id to be active..." local attempt=1 local max_attempts=30 while [ $attempt -le $max_attempts ]; do local status=$(aws ecs describe-clusters --clusters "$resource_id" --query 'clusters[0].status' --output text) if [ "$status" = "ACTIVE" ]; then log "Cluster is now active" return 0 fi log "Cluster status: $status (attempt $attempt/$max_attempts)" sleep 10 ((attempt++)) done log "ERROR: Cluster did not become active within expected time" return 1 ;; "service") log "Waiting for service $resource_id to be stable..." aws ecs wait services-stable --cluster "$CLUSTER_NAME" --services "$resource_id" ;; "nat-gateway") log "Waiting for NAT Gateway $resource_id to be available..." aws ec2 wait nat-gateway-available --nat-gateway-ids "$resource_id" ;; esac } # Function to use default VPC infrastructure setup_default_vpc_infrastructure() { log "Using default VPC infrastructure..." # Get default VPC VPC_ID=$(aws ec2 describe-vpcs --filters "Name=isDefault,Values=true" --query 'Vpcs[0].VpcId' --output text) if [[ "$VPC_ID" == "None" || -z "$VPC_ID" ]]; then log "ERROR: No default VPC found. Please create a default VPC first." exit 1 fi log "Using default VPC: $VPC_ID" # Get default subnets SUBNETS=$(aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" "Name=default-for-az,Values=true" --query 'Subnets[].SubnetId' --output text) SUBNET_ARRAY=($SUBNETS) if [ ${#SUBNET_ARRAY[@]} -lt 2 ]; then log "ERROR: Need at least 2 subnets for ECS Service Connect. Found: ${#SUBNET_ARRAY[@]}" exit 1 fi PUBLIC_SUBNET1=${SUBNET_ARRAY[0]} PUBLIC_SUBNET2=${SUBNET_ARRAY[1]} log "Using subnets: $PUBLIC_SUBNET1, $PUBLIC_SUBNET2" # Create security group for ECS tasks SG_OUTPUT=$(aws ec2 create-security-group \ --group-name "${ENV_PREFIX}-ecs-sg-${RANDOM_SUFFIX}" \ --description "Security group for ECS Service Connect tutorial" \ --vpc-id "$VPC_ID" 2>&1) check_for_errors "$SG_OUTPUT" "create-security-group" SECURITY_GROUP_ID=$(echo "$SG_OUTPUT" | grep -o '"GroupId": "[^"]*"' | cut -d'"' -f4) track_resource "SG:$SECURITY_GROUP_ID" log "Created security group: $SECURITY_GROUP_ID" # Add inbound rules to security group aws ec2 authorize-security-group-ingress \ --group-id "$SECURITY_GROUP_ID" \ --protocol tcp \ --port 80 \ --cidr 0.0.0.0/0 >/dev/null 2>&1 || true aws ec2 authorize-security-group-ingress \ --group-id "$SECURITY_GROUP_ID" \ --protocol tcp \ --port 443 \ --cidr 0.0.0.0/0 >/dev/null 2>&1 || true log "Default VPC infrastructure setup completed" } # Function to create CloudWatch log groups create_log_groups() { log "Creating CloudWatch log groups..." # Create log group for nginx container aws logs create-log-group --log-group-name "/ecs/service-connect-nginx" 2>&1 | grep -v "ResourceAlreadyExistsException" || { if [ ${PIPESTATUS[0]} -eq 0 ]; then log "Log group /ecs/service-connect-nginx created" track_resource "LOG_GROUP:/ecs/service-connect-nginx" else log "Log group /ecs/service-connect-nginx already exists" fi } # Create log group for service connect proxy aws logs create-log-group --log-group-name "/ecs/service-connect-proxy" 2>&1 | grep -v "ResourceAlreadyExistsException" || { if [ ${PIPESTATUS[0]} -eq 0 ]; then log "Log group /ecs/service-connect-proxy created" track_resource "LOG_GROUP:/ecs/service-connect-proxy" else log "Log group /ecs/service-connect-proxy already exists" fi } } # Function to create ECS cluster with Service Connect create_ecs_cluster() { log "Creating ECS cluster with Service Connect..." CLUSTER_OUTPUT=$(aws ecs create-cluster \ --cluster-name "$CLUSTER_NAME" \ --service-connect-defaults namespace="$NAMESPACE_NAME" \ --tags key=Environment,value=tutorial 2>&1) check_for_errors "$CLUSTER_OUTPUT" "create-cluster" track_resource "CLUSTER:$CLUSTER_NAME" log "Created ECS cluster: $CLUSTER_NAME" wait_for_resource "cluster" "$CLUSTER_NAME" # Track the Service Connect namespace that gets created # Wait a moment for the namespace to be created sleep 5 NAMESPACE_ID=$(aws servicediscovery list-namespaces \ --filters Name=TYPE,Values=HTTP \ --query "Namespaces[?Name=='$NAMESPACE_NAME'].Id" --output text 2>/dev/null || echo "") if [[ -n "$NAMESPACE_ID" && "$NAMESPACE_ID" != "None" ]]; then track_resource "NAMESPACE:$NAMESPACE_ID" log "Service Connect namespace created: $NAMESPACE_ID" fi } # Function to create IAM roles create_iam_roles() { log "Creating IAM roles..." # Check if ecsTaskExecutionRole exists if aws iam get-role --role-name ecsTaskExecutionRole >/dev/null 2>&1; then log "IAM role ecsTaskExecutionRole exists" else log "Creating ecsTaskExecutionRole..." aws iam create-role \ --role-name ecsTaskExecutionRole \ --assume-role-policy-document '{ "Version":"2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "ecs-tasks.amazonaws.com"}, "Action": "sts:AssumeRole" }] }' >/dev/null 2>&1 aws iam attach-role-policy \ --role-name ecsTaskExecutionRole \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy >/dev/null 2>&1 track_resource "ROLE:ecsTaskExecutionRole" log "Created ecsTaskExecutionRole" sleep 10 fi # Check if ecsTaskRole exists, create if not if aws iam get-role --role-name ecsTaskRole >/dev/null 2>&1; then log "IAM role ecsTaskRole exists" else log "IAM role ecsTaskRole does not exist, will create it" # Create trust policy for ECS tasks cat > /tmp/ecs-task-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF aws iam create-role \ --role-name ecsTaskRole \ --assume-role-policy-document file:///tmp/ecs-task-trust-policy.json >/dev/null track_resource "IAM_ROLE:ecsTaskRole" log "Created ecsTaskRole" # Wait for role to be available sleep 10 fi } # Function to create task definition create_task_definition() { log "Creating task definition..." # Create task definition JSON cat > /tmp/task-definition.json << EOF { "family": "service-connect-nginx", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "256", "memory": "512", "executionRoleArn": "arn:aws:iam::${ACCOUNT_ID}:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::${ACCOUNT_ID}:role/ecsTaskRole", "containerDefinitions": [ { "name": "nginx", "image": "public.ecr.aws/docker/library/nginx:latest", "portMappings": [ { "containerPort": 80, "protocol": "tcp", "name": "nginx-port" } ], "essential": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/service-connect-nginx", "awslogs-region": "${REGION}", "awslogs-stream-prefix": "ecs" } } } ] } EOF TASK_DEF_OUTPUT=$(aws ecs register-task-definition --cli-input-json file:///tmp/task-definition.json 2>&1) check_for_errors "$TASK_DEF_OUTPUT" "register-task-definition" TASK_DEF_ARN=$(echo "$TASK_DEF_OUTPUT" | grep -o '"taskDefinitionArn": "[^"]*"' | cut -d'"' -f4) track_resource "TASK_DEF:service-connect-nginx" log "Created task definition: $TASK_DEF_ARN" # Clean up temporary file rm -f /tmp/task-definition.json } # Function to create ECS service with Service Connect create_ecs_service() { log "Creating ECS service with Service Connect..." # Create service definition JSON cat > /tmp/service-definition.json << EOF { "serviceName": "service-connect-nginx-service", "cluster": "${CLUSTER_NAME}", "taskDefinition": "service-connect-nginx", "desiredCount": 1, "launchType": "FARGATE", "networkConfiguration": { "awsvpcConfiguration": { "subnets": ["${PUBLIC_SUBNET1}", "${PUBLIC_SUBNET2}"], "securityGroups": ["${SECURITY_GROUP_ID}"], "assignPublicIp": "ENABLED" } }, "serviceConnectConfiguration": { "enabled": true, "namespace": "${NAMESPACE_NAME}", "services": [ { "portName": "nginx-port", "discoveryName": "nginx", "clientAliases": [ { "port": 80, "dnsName": "nginx" } ] } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/service-connect-proxy", "awslogs-region": "${REGION}", "awslogs-stream-prefix": "ecs-service-connect" } } }, "tags": [ { "key": "Environment", "value": "tutorial" } ] } EOF SERVICE_OUTPUT=$(aws ecs create-service --cli-input-json file:///tmp/service-definition.json 2>&1) check_for_errors "$SERVICE_OUTPUT" "create-service" track_resource "SERVICE:service-connect-nginx-service" log "Created ECS service: service-connect-nginx-service" wait_for_resource "service" "service-connect-nginx-service" # Clean up temporary file rm -f /tmp/service-definition.json } # Function to verify deployment verify_deployment() { log "Verifying deployment..." # Check service status SERVICE_STATUS=$(aws ecs describe-services \ --cluster "$CLUSTER_NAME" \ --services "service-connect-nginx-service" \ --query 'services[0].status' --output text) log "Service status: $SERVICE_STATUS" # Check running tasks RUNNING_COUNT=$(aws ecs describe-services \ --cluster "$CLUSTER_NAME" \ --services "service-connect-nginx-service" \ --query 'services[0].runningCount' --output text) log "Running tasks: $RUNNING_COUNT" # Get task ARN TASK_ARN=$(aws ecs list-tasks \ --cluster "$CLUSTER_NAME" \ --service-name "service-connect-nginx-service" \ --query 'taskArns[0]' --output text) if [[ "$TASK_ARN" != "None" && -n "$TASK_ARN" ]]; then log "Task ARN: $TASK_ARN" # Try to get task IP address TASK_IP=$(aws ecs describe-tasks \ --cluster "$CLUSTER_NAME" \ --tasks "$TASK_ARN" \ --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' \ --output text 2>/dev/null || echo "") if [[ -n "$TASK_IP" && "$TASK_IP" != "None" ]]; then log "Task IP address: $TASK_IP" else log "Could not retrieve task IP address" fi fi # Check Service Connect namespace NAMESPACE_STATUS=$(aws servicediscovery list-namespaces \ --filters Name=TYPE,Values=HTTP \ --query "Namespaces[?Name=='$NAMESPACE_NAME'].Id" --output text 2>/dev/null || echo "") if [[ -n "$NAMESPACE_STATUS" && "$NAMESPACE_STATUS" != "None" ]]; then log "Service Connect namespace '$NAMESPACE_NAME' is active" else log "Service Connect namespace '$NAMESPACE_NAME' not found or not active" fi # Display Service Connect configuration log "Service Connect configuration:" aws ecs describe-services \ --cluster "$CLUSTER_NAME" \ --services "service-connect-nginx-service" \ --query 'services[0].serviceConnectConfiguration' 2>/dev/null || true } # Function to display created resources display_resources() { echo "" echo "===========================================" echo "CREATED RESOURCES" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "- $resource" done echo "===========================================" echo "" } # Function to cleanup resources cleanup_resources() { log "Starting cleanup process..." # Delete resources in reverse order of creation for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do resource="${CREATED_RESOURCES[i]}" resource_type=$(echo "$resource" | cut -d':' -f1) resource_id=$(echo "$resource" | cut -d':' -f2) log "Cleaning up $resource_type: $resource_id" case "$resource_type" in "SERVICE") aws ecs update-service --cluster "$CLUSTER_NAME" --service "$resource_id" --desired-count 0 2>&1 | grep -qi "error" && log "Warning: Failed to scale down service $resource_id" aws ecs wait services-stable --cluster "$CLUSTER_NAME" --services "$resource_id" 2>/dev/null || true aws ecs delete-service --cluster "$CLUSTER_NAME" --service "$resource_id" --force 2>&1 | grep -qi "error" && log "Warning: Failed to delete service $resource_id" ;; "TASK_DEF") TASK_DEF_ARNS=$(aws ecs list-task-definitions --family-prefix "$resource_id" --query 'taskDefinitionArns' --output text 2>/dev/null) for arn in $TASK_DEF_ARNS; do aws ecs deregister-task-definition --task-definition "$arn" >/dev/null 2>&1 || true done ;; "ROLE") aws iam detach-role-policy --role-name "$resource_id" --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" 2>/dev/null || true aws iam delete-role --role-name "$resource_id" 2>&1 | grep -qi "error" && log "Warning: Failed to delete role $resource_id" ;; "IAM_ROLE") aws iam detach-role-policy --role-name "$resource_id" --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" 2>/dev/null || true aws iam delete-role --role-name "$resource_id" 2>&1 | grep -qi "error" && log "Warning: Failed to delete role $resource_id" ;; "CLUSTER") aws ecs delete-cluster --cluster "$resource_id" 2>&1 | grep -qi "error" && log "Warning: Failed to delete cluster $resource_id" ;; "SG") for attempt in 1 2 3 4 5; do if aws ec2 delete-security-group --group-id "$resource_id" 2>/dev/null; then break fi log "Security group $resource_id still has dependencies, retrying in 30s ($attempt/5)..." sleep 30 done ;; "LOG_GROUP") aws logs delete-log-group --log-group-name "$resource_id" 2>&1 | grep -qi "error" && log "Warning: Failed to delete log group $resource_id" ;; "NAMESPACE") # First, delete any services in the namespace NAMESPACE_SERVICES=$(aws servicediscovery list-services \ --filters Name=NAMESPACE_ID,Values="$resource_id" \ --query 'Services[].Id' --output text 2>/dev/null || echo "") if [[ -n "$NAMESPACE_SERVICES" && "$NAMESPACE_SERVICES" != "None" ]]; then for service_id in $NAMESPACE_SERVICES; do aws servicediscovery delete-service --id "$service_id" >/dev/null 2>&1 || true sleep 2 done fi # Then delete the namespace aws servicediscovery delete-namespace --id "$resource_id" >/dev/null 2>&1 || true ;; esac sleep 2 # Brief pause between deletions done # Clean up temporary files rm -f /tmp/ecs-task-trust-policy.json rm -f /tmp/task-definition.json rm -f /tmp/service-definition.json log "Cleanup completed" } # Main execution main() { log "Starting $SCRIPT_NAME v4 (Default VPC)" log "Region: $REGION" log "Log file: $LOG_FILE" # Get AWS account ID get_account_id # Setup infrastructure using default VPC setup_default_vpc_infrastructure # Create CloudWatch log groups create_log_groups # Create ECS cluster create_ecs_cluster # Create IAM roles create_iam_roles # Create task definition create_task_definition # Create ECS service create_ecs_service # Verify deployment verify_deployment log "Tutorial completed successfully!" # Display created resources display_resources # Ask user if they want to clean up echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then cleanup_resources log "All resources have been cleaned up" else log "Resources left intact. You can clean them up later by running the cleanup function." echo "" echo "To clean up resources later, you can use the AWS CLI commands or the AWS Management Console." echo "Remember to delete resources in the correct order to avoid dependency issues." fi } # Make script executable and run chmod +x "$0" main "$@"

The following code example shows how to:

  • Create an IAM role for Lambda execution

  • Create and deploy a Lambda function

  • Create a REST API

  • Configure Lambda proxy integration

  • Deploy and test the API

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Simple API Gateway Lambda Integration Script # This script creates a REST API with Lambda proxy integration # Generate random identifiers FUNCTION_NAME="GetStartedLambdaProxyIntegration-$(openssl rand -hex 4)" ROLE_NAME="GetStartedLambdaBasicExecutionRole-$(openssl rand -hex 4)" API_NAME="LambdaProxyAPI-$(openssl rand -hex 4)" # Get AWS account info ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) REGION=$(aws configure get region || echo "us-east-1") echo "Creating Lambda function code..." # Create Lambda function code cat > lambda_function.py << 'EOF' import json def lambda_handler(event, context): print(event) greeter = 'World' try: if (event['queryStringParameters']) and (event['queryStringParameters']['greeter']) and ( event['queryStringParameters']['greeter'] is not None): greeter = event['queryStringParameters']['greeter'] except KeyError: print('No greeter') try: if (event['multiValueHeaders']) and (event['multiValueHeaders']['greeter']) and ( event['multiValueHeaders']['greeter'] is not None): greeter = " and ".join(event['multiValueHeaders']['greeter']) except KeyError: print('No greeter') try: if (event['headers']) and (event['headers']['greeter']) and ( event['headers']['greeter'] is not None): greeter = event['headers']['greeter'] except KeyError: print('No greeter') if (event['body']) and (event['body'] is not None): body = json.loads(event['body']) try: if (body['greeter']) and (body['greeter'] is not None): greeter = body['greeter'] except KeyError: print('No greeter') res = { "statusCode": 200, "headers": { "Content-Type": "*/*" }, "body": "Hello, " + greeter + "!" } return res EOF # Create deployment package zip function.zip lambda_function.py echo "Creating IAM role..." # Create IAM trust policy cat > trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document file://trust-policy.json # Attach execution policy aws iam attach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" # Wait for role propagation sleep 15 echo "Creating Lambda function..." # Create Lambda function aws lambda create-function \ --function-name "$FUNCTION_NAME" \ --runtime python3.9 \ --role "arn:aws:iam::$ACCOUNT_ID:role/$ROLE_NAME" \ --handler lambda_function.lambda_handler \ --zip-file fileb://function.zip echo "Creating API Gateway..." # Create REST API aws apigateway create-rest-api \ --name "$API_NAME" \ --endpoint-configuration types=REGIONAL # Get API ID API_ID=$(aws apigateway get-rest-apis --query "items[?name=='$API_NAME'].id" --output text) # Get root resource ID ROOT_RESOURCE_ID=$(aws apigateway get-resources --rest-api-id "$API_ID" --query 'items[?path==`/`].id' --output text) # Create helloworld resource aws apigateway create-resource \ --rest-api-id "$API_ID" \ --parent-id "$ROOT_RESOURCE_ID" \ --path-part helloworld # Get resource ID RESOURCE_ID=$(aws apigateway get-resources --rest-api-id "$API_ID" --query "items[?pathPart=='helloworld'].id" --output text) # Create ANY method aws apigateway put-method \ --rest-api-id "$API_ID" \ --resource-id "$RESOURCE_ID" \ --http-method ANY \ --authorization-type NONE # Set up Lambda proxy integration LAMBDA_URI="arn:aws:apigateway:$REGION:lambda:path/2015-03-31/functions/arn:aws:lambda:$REGION:$ACCOUNT_ID:function:$FUNCTION_NAME/invocations" aws apigateway put-integration \ --rest-api-id "$API_ID" \ --resource-id "$RESOURCE_ID" \ --http-method ANY \ --type AWS_PROXY \ --integration-http-method POST \ --uri "$LAMBDA_URI" # Grant API Gateway permission to invoke Lambda SOURCE_ARN="arn:aws:execute-api:$REGION:$ACCOUNT_ID:$API_ID/*/*" aws lambda add-permission \ --function-name "$FUNCTION_NAME" \ --statement-id "apigateway-invoke-$(openssl rand -hex 4)" \ --action lambda:InvokeFunction \ --principal apigateway.amazonaws.com \ --source-arn "$SOURCE_ARN" # Deploy API aws apigateway create-deployment \ --rest-api-id "$API_ID" \ --stage-name test echo "Testing API..." # Test the API INVOKE_URL="https://$API_ID.execute-api.$REGION.amazonaws.com/test/helloworld" echo "API URL: $INVOKE_URL" # Test with query parameter echo "Testing with query parameter:" curl -X GET "$INVOKE_URL?greeter=John" echo "" # Test with header echo "Testing with header:" curl -X GET "$INVOKE_URL" \ -H 'content-type: application/json' \ -H 'greeter: John' echo "" # Test with body echo "Testing with POST body:" curl -X POST "$INVOKE_URL" \ -H 'content-type: application/json' \ -d '{ "greeter": "John" }' echo "" echo "Tutorial completed! API is available at: $INVOKE_URL" # Cleanup echo "Cleaning up resources..." # Delete API aws apigateway delete-rest-api --rest-api-id "$API_ID" # Delete Lambda function aws lambda delete-function --function-name "$FUNCTION_NAME" # Detach policy and delete role aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" # Clean up local files rm -f lambda_function.py function.zip trust-policy.json echo "Cleanup completed!"

The following code example shows how to:

  • Create the cluster

  • Create a task definition

  • Create the service

  • Clean up

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon ECS Fargate Tutorial Script - Version 5 # This script creates an ECS cluster, task definition, and service using Fargate launch type # Fixed version with proper resource dependency handling during cleanup set -e # Exit on any error # Initialize logging LOG_FILE="ecs-fargate-tutorial-v5.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon ECS Fargate tutorial at $(date)" echo "Log file: $LOG_FILE" # Generate random identifier for unique resource names RANDOM_ID=$(openssl rand -hex 6) CLUSTER_NAME="fargate-cluster-${RANDOM_ID}" SERVICE_NAME="fargate-service-${RANDOM_ID}" TASK_FAMILY="sample-fargate-${RANDOM_ID}" SECURITY_GROUP_NAME="ecs-fargate-sg-${RANDOM_ID}" # Array to track created resources for cleanup CREATED_RESOURCES=() # Function to log and execute commands execute_command() { local cmd="$1" local description="$2" echo "" echo "==========================================" echo "EXECUTING: $description" echo "COMMAND: $cmd" echo "==========================================" local output local exit_code set +e # Temporarily disable exit on error output=$(eval "$cmd" 2>&1) exit_code=$? set -e # Re-enable exit on error if [[ $exit_code -eq 0 ]]; then echo "SUCCESS: $description" echo "OUTPUT: $output" return 0 else echo "FAILED: $description" echo "EXIT CODE: $exit_code" echo "OUTPUT: $output" return 1 fi } # Function to check for actual AWS API errors in command output check_for_aws_errors() { local output="$1" local description="$2" # Look for specific AWS error patterns, not just the word "error" if echo "$output" | grep -qi "An error occurred\|InvalidParameter\|AccessDenied\|ResourceNotFound\|ValidationException"; then echo "AWS API ERROR detected in output for: $description" echo "Output: $output" return 1 fi return 0 } # Function to wait for network interfaces to be cleaned up wait_for_network_interfaces_cleanup() { local security_group_id="$1" local max_attempts=30 local attempt=1 echo "Waiting for network interfaces to be cleaned up..." while [[ $attempt -le $max_attempts ]]; do echo "Attempt $attempt/$max_attempts: Checking for dependent network interfaces..." # Check if there are any network interfaces still using this security group local eni_count eni_count=$(aws ec2 describe-network-interfaces \ --filters "Name=group-id,Values=$security_group_id" \ --query "length(NetworkInterfaces)" \ --output text 2>/dev/null || echo "0") if [[ "$eni_count" == "0" ]]; then echo "No network interfaces found using security group $security_group_id" return 0 else echo "Found $eni_count network interface(s) still using security group $security_group_id" echo "Waiting 10 seconds before next check..." sleep 10 ((attempt++)) fi done echo "WARNING: Network interfaces may still be attached after $max_attempts attempts" echo "This is normal and the security group deletion will be retried" return 1 } # Function to retry security group deletion with exponential backoff retry_security_group_deletion() { local security_group_id="$1" local max_attempts=10 local attempt=1 local wait_time=5 while [[ $attempt -le $max_attempts ]]; do echo "Attempt $attempt/$max_attempts: Trying to delete security group $security_group_id" if execute_command "aws ec2 delete-security-group --group-id $security_group_id" "Delete security group (attempt $attempt)"; then echo "Successfully deleted security group $security_group_id" return 0 else if [[ $attempt -eq $max_attempts ]]; then echo "FAILED: Could not delete security group $security_group_id after $max_attempts attempts" echo "This may be due to network interfaces that are still being cleaned up by AWS" echo "You can manually delete it later using: aws ec2 delete-security-group --group-id $security_group_id" return 1 else echo "Waiting $wait_time seconds before retry..." sleep $wait_time wait_time=$((wait_time * 2)) # Exponential backoff ((attempt++)) fi fi done } # Function to cleanup resources with proper dependency handling cleanup_resources() { echo "" echo "===========================================" echo "CLEANUP PROCESS" echo "===========================================" echo "The following resources were created:" for resource in "${CREATED_RESOURCES[@]}"; do echo " - $resource" done echo "" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup process..." # Step 1: Scale service to 0 tasks first, then delete service if [[ " ${CREATED_RESOURCES[*]} " =~ " ECS Service: $SERVICE_NAME " ]]; then echo "" echo "Step 1: Scaling service to 0 tasks..." if execute_command "aws ecs update-service --cluster $CLUSTER_NAME --service $SERVICE_NAME --desired-count 0" "Scale service to 0 tasks"; then echo "Waiting for service to stabilize after scaling to 0..." execute_command "aws ecs wait services-stable --cluster $CLUSTER_NAME --services $SERVICE_NAME" "Wait for service to stabilize" echo "Deleting service..." execute_command "aws ecs delete-service --cluster $CLUSTER_NAME --service $SERVICE_NAME" "Delete ECS service" else echo "WARNING: Failed to scale service. Attempting to delete anyway..." execute_command "aws ecs delete-service --cluster $CLUSTER_NAME --service $SERVICE_NAME --force" "Force delete ECS service" fi fi # Step 2: Wait a bit for tasks to fully terminate echo "" echo "Step 2: Waiting for tasks to fully terminate..." sleep 15 # Step 3: Delete cluster if [[ " ${CREATED_RESOURCES[*]} " =~ " ECS Cluster: $CLUSTER_NAME " ]]; then echo "" echo "Step 3: Deleting cluster..." execute_command "aws ecs delete-cluster --cluster $CLUSTER_NAME" "Delete ECS cluster" fi # Step 4: Wait for network interfaces to be cleaned up, then delete security group if [[ -n "$SECURITY_GROUP_ID" ]]; then echo "" echo "Step 4: Cleaning up security group..." # First, wait for network interfaces to be cleaned up wait_for_network_interfaces_cleanup "$SECURITY_GROUP_ID" # Then retry security group deletion with backoff retry_security_group_deletion "$SECURITY_GROUP_ID" fi # Step 5: Clean up task definition (deregister all revisions) if [[ " ${CREATED_RESOURCES[*]} " =~ " Task Definition: $TASK_FAMILY " ]]; then echo "" echo "Step 5: Deregistering task definition revisions..." # Get all revisions of the task definition local revisions revisions=$(aws ecs list-task-definitions --family-prefix "$TASK_FAMILY" --query "taskDefinitionArns" --output text 2>/dev/null || echo "") if [[ -n "$revisions" && "$revisions" != "None" ]]; then for revision_arn in $revisions; do echo "Deregistering task definition: $revision_arn" execute_command "aws ecs deregister-task-definition --task-definition $revision_arn" "Deregister task definition $revision_arn" || true done else echo "No task definition revisions found to deregister" fi fi echo "" echo "===========================================" echo "CLEANUP COMPLETED" echo "===========================================" echo "All resources have been cleaned up successfully!" else echo "Cleanup skipped. Resources remain active." echo "" echo "To clean up manually later, use the following commands in order:" echo "1. Scale service to 0: aws ecs update-service --cluster $CLUSTER_NAME --service $SERVICE_NAME --desired-count 0" echo "2. Wait for stability: aws ecs wait services-stable --cluster $CLUSTER_NAME --services $SERVICE_NAME" echo "3. Delete service: aws ecs delete-service --cluster $CLUSTER_NAME --service $SERVICE_NAME" echo "4. Delete cluster: aws ecs delete-cluster --cluster $CLUSTER_NAME" echo "5. Wait 2-3 minutes, then delete security group: aws ec2 delete-security-group --group-id $SECURITY_GROUP_ID" if [[ " ${CREATED_RESOURCES[*]} " =~ " Task Definition: $TASK_FAMILY " ]]; then echo "6. Deregister task definitions: aws ecs list-task-definitions --family-prefix $TASK_FAMILY" echo " Then for each ARN: aws ecs deregister-task-definition --task-definition <ARN>" fi fi } # Trap to handle script interruption trap cleanup_resources EXIT echo "Using random identifier: $RANDOM_ID" echo "Cluster name: $CLUSTER_NAME" echo "Service name: $SERVICE_NAME" echo "Task family: $TASK_FAMILY" # Step 1: Ensure ECS task execution role exists echo "" echo "===========================================" echo "STEP 1: VERIFY ECS TASK EXECUTION ROLE" echo "===========================================" ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) EXECUTION_ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/ecsTaskExecutionRole" # Check if role exists if aws iam get-role --role-name ecsTaskExecutionRole >/dev/null 2>&1; then echo "ECS task execution role already exists" else echo "Creating ECS task execution role..." # Create trust policy cat > trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF execute_command "aws iam create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://trust-policy.json" "Create ECS task execution role" execute_command "aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" "Attach ECS task execution policy" # Clean up temporary file rm -f trust-policy.json CREATED_RESOURCES+=("IAM Role: ecsTaskExecutionRole") fi # Step 2: Create ECS cluster echo "" echo "===========================================" echo "STEP 2: CREATE ECS CLUSTER" echo "===========================================" CLUSTER_OUTPUT=$(execute_command "aws ecs create-cluster --cluster-name $CLUSTER_NAME" "Create ECS cluster") check_for_aws_errors "$CLUSTER_OUTPUT" "Create ECS cluster" CREATED_RESOURCES+=("ECS Cluster: $CLUSTER_NAME") # Step 3: Create task definition echo "" echo "===========================================" echo "STEP 3: CREATE TASK DEFINITION" echo "===========================================" # Create task definition JSON cat > task-definition.json << EOF { "family": "$TASK_FAMILY", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "256", "memory": "512", "executionRoleArn": "$EXECUTION_ROLE_ARN", "containerDefinitions": [ { "name": "fargate-app", "image": "public.ecr.aws/docker/library/httpd:latest", "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ], "essential": true, "entryPoint": ["sh", "-c"], "command": [ "/bin/sh -c \"echo '<html> <head> <title>Amazon ECS Sample App</title> <style>body {margin-top: 40px; background-color: #333;} </style> </head><body> <div style=color:white;text-align:center> <h1>Amazon ECS Sample App</h1> <h2>Congratulations!</h2> <p>Your application is now running on a container in Amazon ECS.</p> </div></body></html>' > /usr/local/apache2/htdocs/index.html && httpd-foreground\"" ] } ] } EOF TASK_DEF_OUTPUT=$(execute_command "aws ecs register-task-definition --cli-input-json file://task-definition.json" "Register task definition") check_for_aws_errors "$TASK_DEF_OUTPUT" "Register task definition" # Clean up temporary file rm -f task-definition.json CREATED_RESOURCES+=("Task Definition: $TASK_FAMILY") # Step 4: Set up networking echo "" echo "===========================================" echo "STEP 4: SET UP NETWORKING" echo "===========================================" # Get default VPC ID VPC_ID=$(aws ec2 describe-vpcs --filters "Name=is-default,Values=true" --query "Vpcs[0].VpcId" --output text) if [[ "$VPC_ID" == "None" || -z "$VPC_ID" ]]; then echo "ERROR: No default VPC found. Please create a default VPC or specify a custom VPC." exit 1 fi echo "Using default VPC: $VPC_ID" # Create security group with restricted access # Note: This allows HTTP access from anywhere for demo purposes # In production, restrict source to specific IP ranges or security groups SECURITY_GROUP_OUTPUT=$(execute_command "aws ec2 create-security-group --group-name $SECURITY_GROUP_NAME --description 'Security group for ECS Fargate tutorial - HTTP access' --vpc-id $VPC_ID" "Create security group") check_for_aws_errors "$SECURITY_GROUP_OUTPUT" "Create security group" SECURITY_GROUP_ID=$(echo "$SECURITY_GROUP_OUTPUT" | grep -o '"GroupId": "[^"]*"' | cut -d'"' -f4) if [[ -z "$SECURITY_GROUP_ID" ]]; then SECURITY_GROUP_ID=$(aws ec2 describe-security-groups --group-names "$SECURITY_GROUP_NAME" --query "SecurityGroups[0].GroupId" --output text) fi echo "Created security group: $SECURITY_GROUP_ID" CREATED_RESOURCES+=("Security Group: $SECURITY_GROUP_ID") # Add HTTP inbound rule # WARNING: This allows HTTP access from anywhere (0.0.0.0/0) # In production environments, restrict this to specific IP ranges execute_command "aws ec2 authorize-security-group-ingress --group-id $SECURITY_GROUP_ID --protocol tcp --port 80 --cidr 0.0.0.0/0" "Add HTTP inbound rule to security group" # Get subnet IDs from default VPC echo "Getting subnet IDs from default VPC..." SUBNET_IDS_RAW=$(aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" --query "Subnets[*].SubnetId" --output text) if [[ -z "$SUBNET_IDS_RAW" ]]; then echo "ERROR: No subnets found in default VPC" exit 1 fi # Convert to proper comma-separated format, handling both spaces and tabs SUBNET_IDS_COMMA=$(echo "$SUBNET_IDS_RAW" | tr -s '[:space:]' ',' | sed 's/,$//') echo "Raw subnet IDs: $SUBNET_IDS_RAW" echo "Formatted subnet IDs: $SUBNET_IDS_COMMA" # Validate subnet IDs format if [[ ! "$SUBNET_IDS_COMMA" =~ ^subnet-[a-z0-9]+(,subnet-[a-z0-9]+)*$ ]]; then echo "ERROR: Invalid subnet ID format: $SUBNET_IDS_COMMA" exit 1 fi # Step 5: Create ECS service echo "" echo "===========================================" echo "STEP 5: CREATE ECS SERVICE" echo "===========================================" # Create the service with proper JSON formatting for network configuration SERVICE_CMD="aws ecs create-service --cluster $CLUSTER_NAME --service-name $SERVICE_NAME --task-definition $TASK_FAMILY --desired-count 1 --launch-type FARGATE --network-configuration '{\"awsvpcConfiguration\":{\"subnets\":[\"$(echo $SUBNET_IDS_COMMA | sed 's/,/","/g')\"],\"securityGroups\":[\"$SECURITY_GROUP_ID\"],\"assignPublicIp\":\"ENABLED\"}}'" echo "Service creation command: $SERVICE_CMD" SERVICE_OUTPUT=$(execute_command "$SERVICE_CMD" "Create ECS service") check_for_aws_errors "$SERVICE_OUTPUT" "Create ECS service" CREATED_RESOURCES+=("ECS Service: $SERVICE_NAME") # Step 6: Wait for service to stabilize and get public IP echo "" echo "===========================================" echo "STEP 6: WAIT FOR SERVICE AND GET PUBLIC IP" echo "===========================================" echo "Waiting for service to stabilize (this may take a few minutes)..." execute_command "aws ecs wait services-stable --cluster $CLUSTER_NAME --services $SERVICE_NAME" "Wait for service to stabilize" # Get task ARN TASK_ARN=$(aws ecs list-tasks --cluster $CLUSTER_NAME --service-name $SERVICE_NAME --query "taskArns[0]" --output text) if [[ "$TASK_ARN" == "None" || -z "$TASK_ARN" ]]; then echo "ERROR: No running tasks found for service" exit 1 fi echo "Task ARN: $TASK_ARN" # Get network interface ID ENI_ID=$(aws ecs describe-tasks --cluster $CLUSTER_NAME --tasks $TASK_ARN --query "tasks[0].attachments[0].details[?name=='networkInterfaceId'].value" --output text) if [[ "$ENI_ID" == "None" || -z "$ENI_ID" ]]; then echo "ERROR: Could not retrieve network interface ID" exit 1 fi echo "Network Interface ID: $ENI_ID" # Get public IP PUBLIC_IP=$(aws ec2 describe-network-interfaces --network-interface-ids $ENI_ID --query "NetworkInterfaces[0].Association.PublicIp" --output text) if [[ "$PUBLIC_IP" == "None" || -z "$PUBLIC_IP" ]]; then echo "WARNING: No public IP assigned to the task" echo "The task may be in a private subnet or public IP assignment failed" else echo "" echo "===========================================" echo "SUCCESS! APPLICATION IS RUNNING" echo "===========================================" echo "Your application is available at: http://$PUBLIC_IP" echo "You can test it by opening this URL in your browser" echo "" fi # Display service information echo "" echo "===========================================" echo "SERVICE INFORMATION" echo "===========================================" execute_command "aws ecs describe-services --cluster $CLUSTER_NAME --services $SERVICE_NAME" "Get service details" echo "" echo "===========================================" echo "TUTORIAL COMPLETED SUCCESSFULLY" echo "===========================================" echo "Resources created:" for resource in "${CREATED_RESOURCES[@]}"; do echo " - $resource" done if [[ -n "$PUBLIC_IP" && "$PUBLIC_IP" != "None" ]]; then echo "" echo "Application URL: http://$PUBLIC_IP" fi echo "" echo "Script completed at $(date)"

The following code example shows how to:

  • Create a CloudWatch dashboard

  • Add Lambda metrics widgets with a function name variable

  • Verify the dashboard

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Script to create a CloudWatch dashboard with Lambda function name as a variable # This script creates a CloudWatch dashboard that allows you to switch between different Lambda functions # Set up logging LOG_FILE="cloudwatch-dashboard-script.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "$(date): Starting CloudWatch dashboard creation script" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created:" echo "- CloudWatch Dashboard: LambdaMetricsDashboard" echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "An error occurred. Do you want to clean up the created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then echo "Cleaning up resources..." aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard echo "Cleanup complete." else echo "Resources were not cleaned up. You can manually delete them later." fi exit 1 } # Check if AWS CLI is installed and configured echo "Checking AWS CLI configuration..." aws sts get-caller-identity > /dev/null 2>&1 if [ $? -ne 0 ]; then handle_error "AWS CLI is not properly configured. Please configure it with 'aws configure' and try again." fi # Get the current region REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" echo "No region found in AWS config, defaulting to $REGION" fi echo "Using region: $REGION" # Check if there are any Lambda functions in the account echo "Checking for Lambda functions..." LAMBDA_FUNCTIONS=$(aws lambda list-functions --query "Functions[*].FunctionName" --output text) if [ -z "$LAMBDA_FUNCTIONS" ]; then echo "No Lambda functions found in your account. Creating a simple test function..." # Create a temporary directory for Lambda function code TEMP_DIR=$(mktemp -d) # Create a simple Lambda function cat > "$TEMP_DIR/index.js" << EOF exports.handler = async (event) => { console.log('Event:', JSON.stringify(event, null, 2)); return { statusCode: 200, body: JSON.stringify('Hello from Lambda!'), }; }; EOF # Zip the function code cd "$TEMP_DIR" || handle_error "Failed to change to temporary directory" zip -q function.zip index.js # Create a role for the Lambda function ROLE_NAME="LambdaDashboardTestRole" ROLE_ARN=$(aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' \ --query "Role.Arn" \ --output text) if [ $? -ne 0 ]; then handle_error "Failed to create IAM role for Lambda function" fi echo "Waiting for role to be available..." sleep 10 # Attach basic Lambda execution policy aws iam attach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" if [ $? -ne 0 ]; then aws iam delete-role --role-name "$ROLE_NAME" handle_error "Failed to attach policy to IAM role" fi # Create the Lambda function FUNCTION_NAME="DashboardTestFunction" aws lambda create-function \ --function-name "$FUNCTION_NAME" \ --runtime nodejs18.x \ --role "$ROLE_ARN" \ --handler index.handler \ --zip-file fileb://function.zip if [ $? -ne 0 ]; then aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" handle_error "Failed to create Lambda function" fi # Invoke the function to generate some metrics echo "Invoking Lambda function to generate metrics..." for i in {1..5}; do aws lambda invoke --function-name "$FUNCTION_NAME" --payload '{}' /dev/null > /dev/null sleep 1 done # Clean up temporary directory cd - > /dev/null rm -rf "$TEMP_DIR" # Set the function name for the dashboard DEFAULT_FUNCTION="$FUNCTION_NAME" else # Use the first Lambda function as default DEFAULT_FUNCTION=$(echo "$LAMBDA_FUNCTIONS" | awk '{print $1}') echo "Found Lambda functions. Using $DEFAULT_FUNCTION as default." fi # Create a dashboard with Lambda metrics and a function name variable echo "Creating CloudWatch dashboard with Lambda function name variable..." # Create a JSON file for the dashboard body cat > dashboard-body.json << EOF { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "Invocations", "FunctionName", "\${FunctionName}" ], [ ".", "Errors", ".", "." ], [ ".", "Throttles", ".", "." ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Lambda Function Metrics for \${FunctionName}", "period": 300 } }, { "type": "metric", "x": 0, "y": 6, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "Duration", "FunctionName", "\${FunctionName}", { "stat": "Average" } ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Duration for \${FunctionName}", "period": 300 } }, { "type": "metric", "x": 12, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "ConcurrentExecutions", "FunctionName", "\${FunctionName}" ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Concurrent Executions for \${FunctionName}", "period": 300 } } ], "periodOverride": "auto", "variables": [ { "type": "property", "id": "FunctionName", "property": "FunctionName", "label": "Lambda Function", "inputType": "select", "values": [ { "value": "$DEFAULT_FUNCTION", "label": "$DEFAULT_FUNCTION" } ] } ] } EOF # Create the dashboard using the JSON file DASHBOARD_RESULT=$(aws cloudwatch put-dashboard --dashboard-name LambdaMetricsDashboard --dashboard-body file://dashboard-body.json) DASHBOARD_EXIT_CODE=$? # Check if there was a fatal error if [ $DASHBOARD_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to create CloudWatch dashboard." fi # Display any validation messages but continue if [[ "$DASHBOARD_RESULT" == *"DashboardValidationMessages"* ]]; then echo "Dashboard created with validation messages:" echo "$DASHBOARD_RESULT" echo "These validation messages are warnings and the dashboard should still function." else echo "Dashboard created successfully!" fi # Verify the dashboard was created echo "Verifying dashboard creation..." DASHBOARD_INFO=$(aws cloudwatch get-dashboard --dashboard-name LambdaMetricsDashboard) DASHBOARD_INFO_EXIT_CODE=$? if [ $DASHBOARD_INFO_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to verify dashboard creation." fi echo "Dashboard verification successful!" echo "Dashboard details:" echo "$DASHBOARD_INFO" # List all dashboards to confirm echo "Listing all dashboards:" DASHBOARDS=$(aws cloudwatch list-dashboards) DASHBOARDS_EXIT_CODE=$? if [ $DASHBOARDS_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to list dashboards." fi echo "$DASHBOARDS" # Show instructions for accessing the dashboard echo "" echo "Dashboard created successfully! To access it:" echo "1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/" echo "2. In the navigation pane, choose Dashboards" echo "3. Select LambdaMetricsDashboard" echo "4. You should see a dropdown menu labeled 'Lambda Function' at the top of the dashboard" echo "5. Use this dropdown to select different Lambda functions and see their metrics" echo "" # Create a list of resources for cleanup RESOURCES=("- CloudWatch Dashboard: LambdaMetricsDashboard") if [ -n "${FUNCTION_NAME:-}" ]; then RESOURCES+=("- Lambda Function: $FUNCTION_NAME") RESOURCES+=("- IAM Role: $ROLE_NAME") fi # Prompt for cleanup echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Resources created:" for resource in "${RESOURCES[@]}"; do echo "$resource" done echo "" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then echo "Cleaning up resources..." # Delete the dashboard aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard if [ $? -ne 0 ]; then echo "WARNING: Failed to delete dashboard. You may need to delete it manually." else echo "Dashboard deleted successfully." fi # If we created a Lambda function, delete it and its role if [ -n "${FUNCTION_NAME:-}" ]; then echo "Deleting Lambda function..." aws lambda delete-function --function-name "$FUNCTION_NAME" if [ $? -ne 0 ]; then echo "WARNING: Failed to delete Lambda function. You may need to delete it manually." else echo "Lambda function deleted successfully." fi echo "Detaching role policy..." aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" if [ $? -ne 0 ]; then echo "WARNING: Failed to detach role policy. You may need to detach it manually." else echo "Role policy detached successfully." fi echo "Deleting IAM role..." aws iam delete-role --role-name "$ROLE_NAME" if [ $? -ne 0 ]; then echo "WARNING: Failed to delete IAM role. You may need to delete it manually." else echo "IAM role deleted successfully." fi fi # Clean up the JSON file rm -f dashboard-body.json echo "Cleanup complete." else echo "Resources were not cleaned up. You can manually delete them later with:" echo "aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard" if [ -n "${FUNCTION_NAME:-}" ]; then echo "aws lambda delete-function --function-name $FUNCTION_NAME" echo "aws iam detach-role-policy --role-name $ROLE_NAME --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" echo "aws iam delete-role --role-name $ROLE_NAME" fi fi echo "Script completed successfully!"

The following code example shows how to:

  • Create an ECS cluster

  • Create and monitor a service

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # ECS EC2 Launch Type Tutorial Script - UPDATED VERSION # This script demonstrates creating an ECS cluster, launching a container instance, # registering a task definition, and creating a service using the EC2 launch type. # Updated to match the tutorial draft with nginx web server and service creation. # # - UPDATED: Changed from sleep task to nginx web server with service set -e # Exit on any error # Configuration SCRIPT_NAME="ecs-ec2-tutorial" LOG_FILE="${SCRIPT_NAME}-$(date +%Y%m%d-%H%M%S).log" CLUSTER_NAME="tutorial-cluster-$(openssl rand -hex 4)" TASK_FAMILY="nginx-task-$(openssl rand -hex 4)" SERVICE_NAME="nginx-service-$(openssl rand -hex 4)" KEY_PAIR_NAME="ecs-tutorial-key-$(openssl rand -hex 4)" SECURITY_GROUP_NAME="ecs-tutorial-sg-$(openssl rand -hex 4)" # Get current AWS region dynamically AWS_REGION=$(aws configure get region || echo "us-east-1") # Resource tracking arrays CREATED_RESOURCES=() CLEANUP_ORDER=() # Logging function log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE" } # Error handling function handle_error() { local exit_code=$? log "ERROR: Script failed with exit code $exit_code" log "ERROR: Last command: $BASH_COMMAND" echo "" echo "===========================================" echo "ERROR OCCURRED - ATTEMPTING CLEANUP" echo "===========================================" echo "Resources created before error:" for resource in "${CREATED_RESOURCES[@]}"; do echo " - $resource" done cleanup_resources exit $exit_code } # Set error trap trap handle_error ERR # FIXED: Enhanced cleanup function with proper error handling and logging cleanup_resources() { log "Starting cleanup process..." local cleanup_errors=0 # Delete service first (this will stop tasks automatically) if [[ -n "${SERVICE_ARN:-}" ]]; then log "Updating service to desired count 0: $SERVICE_NAME" if ! aws ecs update-service --cluster "$CLUSTER_NAME" --service "$SERVICE_NAME" --desired-count 0 2>>"$LOG_FILE"; then log "WARNING: Failed to update service desired count to 0" ((cleanup_errors++)) else log "Waiting for service tasks to stop..." sleep 30 # Give time for tasks to stop fi log "Deleting service: $SERVICE_NAME" if ! aws ecs delete-service --cluster "$CLUSTER_NAME" --service "$SERVICE_NAME" 2>>"$LOG_FILE"; then log "WARNING: Failed to delete service $SERVICE_NAME" ((cleanup_errors++)) fi fi # Stop and delete any remaining tasks if [[ -n "${TASK_ARN:-}" ]]; then log "Stopping task: $TASK_ARN" if ! aws ecs stop-task --cluster "$CLUSTER_NAME" --task "$TASK_ARN" --reason "Tutorial cleanup" 2>>"$LOG_FILE"; then log "WARNING: Failed to stop task $TASK_ARN" ((cleanup_errors++)) else log "Waiting for task to stop..." if ! aws ecs wait tasks-stopped --cluster "$CLUSTER_NAME" --tasks "$TASK_ARN" 2>>"$LOG_FILE"; then log "WARNING: Task stop wait failed for $TASK_ARN" ((cleanup_errors++)) fi fi fi # Deregister task definition if [[ -n "${TASK_DEFINITION_ARN:-}" ]]; then log "Deregistering task definition: $TASK_DEFINITION_ARN" if ! aws ecs deregister-task-definition --task-definition "$TASK_DEFINITION_ARN" 2>>"$LOG_FILE"; then log "WARNING: Failed to deregister task definition $TASK_DEFINITION_ARN" ((cleanup_errors++)) fi fi # Terminate EC2 instance if [[ -n "${INSTANCE_ID:-}" ]]; then log "Terminating EC2 instance: $INSTANCE_ID" if ! aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" 2>>"$LOG_FILE"; then log "WARNING: Failed to terminate instance $INSTANCE_ID" ((cleanup_errors++)) else log "Waiting for instance to terminate..." if ! aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID" 2>>"$LOG_FILE"; then log "WARNING: Instance termination wait failed for $INSTANCE_ID" ((cleanup_errors++)) fi fi fi # Delete security group with retry logic if [[ -n "${SECURITY_GROUP_ID:-}" ]]; then log "Deleting security group: $SECURITY_GROUP_ID" local retry_count=0 local max_retries=3 while [[ $retry_count -lt $max_retries ]]; do if aws ec2 delete-security-group --group-id "$SECURITY_GROUP_ID" 2>>"$LOG_FILE"; then log "Successfully deleted security group" break else ((retry_count++)) if [[ $retry_count -lt $max_retries ]]; then log "Retry $retry_count/$max_retries: Waiting 10 seconds before retrying security group deletion..." sleep 10 else log "ERROR: Failed to delete security group after $max_retries attempts" ((cleanup_errors++)) fi fi done fi # Delete key pair if [[ -n "${KEY_PAIR_NAME:-}" ]]; then log "Deleting key pair: $KEY_PAIR_NAME" if ! aws ec2 delete-key-pair --key-name "$KEY_PAIR_NAME" 2>>"$LOG_FILE"; then log "WARNING: Failed to delete key pair $KEY_PAIR_NAME" ((cleanup_errors++)) fi rm -f "${KEY_PAIR_NAME}.pem" 2>>"$LOG_FILE" || log "WARNING: Failed to remove local key file" fi # Delete ECS cluster if [[ -n "${CLUSTER_NAME:-}" ]]; then log "Deleting ECS cluster: $CLUSTER_NAME" if ! aws ecs delete-cluster --cluster "$CLUSTER_NAME" 2>>"$LOG_FILE"; then log "WARNING: Failed to delete cluster $CLUSTER_NAME" ((cleanup_errors++)) fi fi if [[ $cleanup_errors -eq 0 ]]; then log "Cleanup completed successfully" else log "Cleanup completed with $cleanup_errors warnings/errors. Check log file for details." fi } # Function to check prerequisites check_prerequisites() { log "Checking prerequisites..." # Check AWS CLI if ! command -v aws &> /dev/null; then log "ERROR: AWS CLI is not installed" exit 1 fi # Check AWS credentials if ! aws sts get-caller-identity &> /dev/null; then log "ERROR: AWS credentials not configured" exit 1 fi # Get caller identity CALLER_IDENTITY=$(aws sts get-caller-identity --output text --query 'Account') log "AWS Account: $CALLER_IDENTITY" log "AWS Region: $AWS_REGION" # Check for default VPC DEFAULT_VPC=$(aws ec2 describe-vpcs --filters "Name=is-default,Values=true" --query 'Vpcs[0].VpcId' --output text) if [[ "$DEFAULT_VPC" == "None" ]]; then log "ERROR: No default VPC found. Please create a VPC first." exit 1 fi log "Using default VPC: $DEFAULT_VPC" # Get default subnet DEFAULT_SUBNET=$(aws ec2 describe-subnets --filters "Name=vpc-id,Values=$DEFAULT_VPC" "Name=default-for-az,Values=true" --query 'Subnets[0].SubnetId' --output text) if [[ "$DEFAULT_SUBNET" == "None" ]]; then log "ERROR: No default subnet found" exit 1 fi log "Using default subnet: $DEFAULT_SUBNET" log "Prerequisites check completed successfully" } # Function to create ECS cluster create_cluster() { log "Creating ECS cluster: $CLUSTER_NAME" CLUSTER_ARN=$(aws ecs create-cluster --cluster-name "$CLUSTER_NAME" --query 'cluster.clusterArn' --output text) if [[ -z "$CLUSTER_ARN" ]]; then log "ERROR: Failed to create cluster" exit 1 fi log "Created cluster: $CLUSTER_ARN" CREATED_RESOURCES+=("ECS Cluster: $CLUSTER_NAME") } # Function to create key pair create_key_pair() { log "Creating EC2 key pair: $KEY_PAIR_NAME" # FIXED: Set secure umask before key creation umask 077 aws ec2 create-key-pair --key-name "$KEY_PAIR_NAME" --query 'KeyMaterial' --output text > "${KEY_PAIR_NAME}.pem" chmod 400 "${KEY_PAIR_NAME}.pem" umask 022 # Reset umask log "Created key pair: $KEY_PAIR_NAME" CREATED_RESOURCES+=("EC2 Key Pair: $KEY_PAIR_NAME") } # Function to create security group create_security_group() { log "Creating security group: $SECURITY_GROUP_NAME" SECURITY_GROUP_ID=$(aws ec2 create-security-group \ --group-name "$SECURITY_GROUP_NAME" \ --description "ECS tutorial security group" \ --vpc-id "$DEFAULT_VPC" \ --query 'GroupId' --output text) if [[ -z "$SECURITY_GROUP_ID" ]]; then log "ERROR: Failed to create security group" exit 1 fi # Add HTTP access rule for nginx web server aws ec2 authorize-security-group-ingress \ --group-id "$SECURITY_GROUP_ID" \ --protocol tcp \ --port 80 \ --cidr "0.0.0.0/0" log "Created security group: $SECURITY_GROUP_ID" log "Added HTTP access on port 80" CREATED_RESOURCES+=("Security Group: $SECURITY_GROUP_ID") } # Function to get ECS optimized AMI get_ecs_ami() { log "Getting ECS-optimized AMI ID..." ECS_AMI_ID=$(aws ssm get-parameters \ --names /aws/service/ecs/optimized-ami/amazon-linux-2/recommended \ --query 'Parameters[0].Value' --output text | jq -r '.image_id') if [[ -z "$ECS_AMI_ID" ]]; then log "ERROR: Failed to get ECS-optimized AMI ID" exit 1 fi log "ECS-optimized AMI ID: $ECS_AMI_ID" } # Function to create IAM role for ECS instance (if it doesn't exist) ensure_ecs_instance_role() { log "Checking for ecsInstanceRole..." if ! aws iam get-role --role-name ecsInstanceRole &> /dev/null; then log "Creating ecsInstanceRole..." # Create trust policy cat > ecs-instance-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create role aws iam create-role \ --role-name ecsInstanceRole \ --assume-role-policy-document file://ecs-instance-trust-policy.json # Attach managed policy aws iam attach-role-policy \ --role-name ecsInstanceRole \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role # Create instance profile aws iam create-instance-profile --instance-profile-name ecsInstanceRole # Add role to instance profile aws iam add-role-to-instance-profile \ --instance-profile-name ecsInstanceRole \ --role-name ecsInstanceRole # FIXED: Enhanced wait for role to be ready log "Waiting for IAM role to be ready..." aws iam wait role-exists --role-name ecsInstanceRole sleep 30 # Additional buffer for eventual consistency rm -f ecs-instance-trust-policy.json log "Created ecsInstanceRole" CREATED_RESOURCES+=("IAM Role: ecsInstanceRole") else log "ecsInstanceRole already exists" fi } # Function to launch container instance launch_container_instance() { log "Launching ECS container instance..." # Create user data script cat > ecs-user-data.sh << EOF #!/bin/bash echo ECS_CLUSTER=$CLUSTER_NAME >> /etc/ecs/ecs.config EOF INSTANCE_ID=$(aws ec2 run-instances \ --image-id "$ECS_AMI_ID" \ --instance-type t3.micro \ --key-name "$KEY_PAIR_NAME" \ --security-group-ids "$SECURITY_GROUP_ID" \ --subnet-id "$DEFAULT_SUBNET" \ --iam-instance-profile Name=ecsInstanceRole \ --user-data file://ecs-user-data.sh \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=ecs-tutorial-instance}]" \ --query 'Instances[0].InstanceId' --output text) if [[ -z "$INSTANCE_ID" ]]; then log "ERROR: Failed to launch EC2 instance" exit 1 fi log "Launched EC2 instance: $INSTANCE_ID" CREATED_RESOURCES+=("EC2 Instance: $INSTANCE_ID") # Wait for instance to be running log "Waiting for instance to be running..." aws ec2 wait instance-running --instance-ids "$INSTANCE_ID" # Wait for ECS agent to register log "Waiting for ECS agent to register with cluster..." local max_attempts=30 local attempt=0 while [[ $attempt -lt $max_attempts ]]; do CONTAINER_INSTANCES=$(aws ecs list-container-instances --cluster "$CLUSTER_NAME" --query 'containerInstanceArns' --output text) if [[ -n "$CONTAINER_INSTANCES" && "$CONTAINER_INSTANCES" != "None" ]]; then log "Container instance registered successfully" break fi attempt=$((attempt + 1)) log "Waiting for container instance registration... (attempt $attempt/$max_attempts)" sleep 10 done if [[ $attempt -eq $max_attempts ]]; then log "ERROR: Container instance failed to register within expected time" exit 1 fi rm -f ecs-user-data.sh } # Function to register task definition register_task_definition() { log "Creating task definition..." # Create nginx task definition JSON matching the tutorial cat > task-definition.json << EOF { "family": "$TASK_FAMILY", "containerDefinitions": [ { "name": "nginx", "image": "public.ecr.aws/docker/library/nginx:latest", "cpu": 256, "memory": 512, "essential": true, "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ] } ], "requiresCompatibilities": ["EC2"], "networkMode": "bridge" } EOF # FIXED: Validate JSON before registration if ! jq empty task-definition.json 2>/dev/null; then log "ERROR: Invalid JSON in task definition" exit 1 fi TASK_DEFINITION_ARN=$(aws ecs register-task-definition \ --cli-input-json file://task-definition.json \ --query 'taskDefinition.taskDefinitionArn' --output text) if [[ -z "$TASK_DEFINITION_ARN" ]]; then log "ERROR: Failed to register task definition" exit 1 fi log "Registered task definition: $TASK_DEFINITION_ARN" CREATED_RESOURCES+=("Task Definition: $TASK_DEFINITION_ARN") rm -f task-definition.json } # Function to create service create_service() { log "Creating ECS service..." SERVICE_ARN=$(aws ecs create-service \ --cluster "$CLUSTER_NAME" \ --service-name "$SERVICE_NAME" \ --task-definition "$TASK_FAMILY" \ --desired-count 1 \ --query 'service.serviceArn' --output text) if [[ -z "$SERVICE_ARN" ]]; then log "ERROR: Failed to create service" exit 1 fi log "Created service: $SERVICE_ARN" CREATED_RESOURCES+=("ECS Service: $SERVICE_NAME") # Wait for service to be stable log "Waiting for service to be stable..." aws ecs wait services-stable --cluster "$CLUSTER_NAME" --services "$SERVICE_NAME" log "Service is now stable and running" # Get the task ARN for monitoring TASK_ARN=$(aws ecs list-tasks --cluster "$CLUSTER_NAME" --service-name "$SERVICE_NAME" --query 'taskArns[0]' --output text) if [[ -n "$TASK_ARN" && "$TASK_ARN" != "None" ]]; then log "Service task: $TASK_ARN" CREATED_RESOURCES+=("ECS Task: $TASK_ARN") fi } # Function to demonstrate monitoring and testing demonstrate_monitoring() { log "Demonstrating monitoring capabilities..." # List services log "Listing services in cluster:" aws ecs list-services --cluster "$CLUSTER_NAME" --output table # Describe service log "Service details:" aws ecs describe-services --cluster "$CLUSTER_NAME" --services "$SERVICE_NAME" --output table --query 'services[0].{ServiceName:serviceName,Status:status,RunningCount:runningCount,DesiredCount:desiredCount,TaskDefinition:taskDefinition}' # List tasks log "Listing tasks in service:" aws ecs list-tasks --cluster "$CLUSTER_NAME" --service-name "$SERVICE_NAME" --output table # Describe task if [[ -n "$TASK_ARN" && "$TASK_ARN" != "None" ]]; then log "Task details:" aws ecs describe-tasks --cluster "$CLUSTER_NAME" --tasks "$TASK_ARN" --output table --query 'tasks[0].{TaskArn:taskArn,LastStatus:lastStatus,DesiredStatus:desiredStatus,CreatedAt:createdAt}' fi # List container instances log "Container instances in cluster:" aws ecs list-container-instances --cluster "$CLUSTER_NAME" --output table # Describe container instance CONTAINER_INSTANCE_ARN=$(aws ecs list-container-instances --cluster "$CLUSTER_NAME" --query 'containerInstanceArns[0]' --output text) if [[ -n "$CONTAINER_INSTANCE_ARN" && "$CONTAINER_INSTANCE_ARN" != "None" ]]; then log "Container instance details:" aws ecs describe-container-instances --cluster "$CLUSTER_NAME" --container-instances "$CONTAINER_INSTANCE_ARN" --output table --query 'containerInstances[0].{Arn:containerInstanceArn,Status:status,RunningTasks:runningTasksCount,PendingTasks:pendingTasksCount}' fi # Test the nginx web server log "Testing nginx web server..." PUBLIC_IP=$(aws ec2 describe-instances --instance-ids "$INSTANCE_ID" --query 'Reservations[0].Instances[0].PublicIpAddress' --output text) if [[ -n "$PUBLIC_IP" && "$PUBLIC_IP" != "None" ]]; then log "Container instance public IP: $PUBLIC_IP" log "Testing HTTP connection to nginx..." # Wait a moment for nginx to be fully ready sleep 10 if curl -s --connect-timeout 10 "http://$PUBLIC_IP" | grep -q "Welcome to nginx"; then log "SUCCESS: Nginx web server is responding correctly" echo "" echo "===========================================" echo "WEB SERVER TEST SUCCESSFUL" echo "===========================================" echo "You can access your nginx web server at: http://$PUBLIC_IP" echo "The nginx welcome page should be visible in your browser." else log "WARNING: Nginx web server may not be fully ready yet. Try accessing http://$PUBLIC_IP in a few minutes." fi else log "WARNING: Could not retrieve public IP address" fi } # Main execution main() { log "Starting ECS EC2 Launch Type Tutorial (UPDATED VERSION)" log "Log file: $LOG_FILE" check_prerequisites create_cluster create_key_pair create_security_group get_ecs_ami ensure_ecs_instance_role launch_container_instance register_task_definition create_service demonstrate_monitoring log "Tutorial completed successfully!" echo "" echo "===========================================" echo "TUTORIAL COMPLETED SUCCESSFULLY" echo "===========================================" echo "Resources created:" for resource in "${CREATED_RESOURCES[@]}"; do echo " - $resource" done echo "" echo "The nginx service will continue running and maintain the desired task count." echo "You can monitor the service status using:" echo " aws ecs describe-services --cluster $CLUSTER_NAME --services $SERVICE_NAME" echo "" if [[ -n "${PUBLIC_IP:-}" ]]; then echo "Access your web server at: http://$PUBLIC_IP" echo "" fi echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then cleanup_resources log "All resources have been cleaned up" else log "Resources left running. Remember to clean them up manually to avoid charges." echo "" echo "To clean up manually later, run these commands:" echo " aws ecs update-service --cluster $CLUSTER_NAME --service $SERVICE_NAME --desired-count 0" echo " aws ecs delete-service --cluster $CLUSTER_NAME --service $SERVICE_NAME" echo " aws ecs delete-cluster --cluster $CLUSTER_NAME" echo " aws ec2 terminate-instances --instance-ids $INSTANCE_ID" echo " aws ec2 delete-security-group --group-id $SECURITY_GROUP_ID" echo " aws ec2 delete-key-pair --key-name $KEY_PAIR_NAME" fi log "Script execution completed" } # Run main function main "$@"

The following code example shows how to:

  • Create an IAM role for your workspace

  • Create a Grafana workspace

  • Configure authentication

  • Configure optional settings

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon Managed Grafana Workspace Creation Script # This script creates an Amazon Managed Grafana workspace and configures it # Set up logging LOG_FILE="grafana-workspace-creation.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon Managed Grafana workspace creation script at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 if echo "$output" | grep -i "error\|exception\|fail" > /dev/null; then echo "ERROR: Command '$cmd' failed with output:" echo "$output" cleanup_on_error exit 1 fi } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Attempting to clean up resources..." if [ -n "$WORKSPACE_ID" ]; then echo "Deleting workspace $WORKSPACE_ID..." aws grafana delete-workspace --workspace-id "$WORKSPACE_ID" fi if [ -n "$ROLE_NAME" ]; then echo "Detaching policies from role $ROLE_NAME..." if [ -n "$POLICY_ARN" ]; then aws iam detach-role-policy --role-name "$ROLE_NAME" --policy-arn "$POLICY_ARN" fi echo "Deleting role $ROLE_NAME..." aws iam delete-role --role-name "$ROLE_NAME" fi if [ -n "$POLICY_ARN" ]; then echo "Deleting policy..." aws iam delete-policy --policy-arn "$POLICY_ARN" fi # Clean up JSON files rm -f trust-policy.json cloudwatch-policy.json echo "Cleanup completed. See $LOG_FILE for details." } # Generate a random identifier for resource names RANDOM_ID=$(openssl rand -hex 4) WORKSPACE_NAME="GrafanaWorkspace-${RANDOM_ID}" ROLE_NAME="GrafanaWorkspaceRole-${RANDOM_ID}" echo "Using workspace name: $WORKSPACE_NAME" echo "Using role name: $ROLE_NAME" # Step 1: Get AWS account ID echo "Getting AWS account ID..." ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) check_error "$ACCOUNT_ID" "get-caller-identity" echo "AWS Account ID: $ACCOUNT_ID" # Step 2: Create IAM role for Grafana workspace echo "Creating IAM role for Grafana workspace..." # Create trust policy document cat > trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "grafana.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role ROLE_OUTPUT=$(aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document file://trust-policy.json \ --description "Role for Amazon Managed Grafana workspace") check_error "$ROLE_OUTPUT" "create-role" echo "IAM role created successfully" # Extract role ARN ROLE_ARN=$(echo "$ROLE_OUTPUT" | grep -o '"Arn": "[^"]*' | cut -d'"' -f4) echo "Role ARN: $ROLE_ARN" # Attach policies to the role echo "Attaching policies to the role..." # CloudWatch policy cat > cloudwatch-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:DescribeAlarmsForMetric", "cloudwatch:DescribeAlarmHistory", "cloudwatch:DescribeAlarms", "cloudwatch:ListMetrics", "cloudwatch:GetMetricStatistics", "cloudwatch:GetMetricData" ], "Resource": "*" } ] } EOF POLICY_OUTPUT=$(aws iam create-policy \ --policy-name "GrafanaCloudWatchPolicy-${RANDOM_ID}" \ --policy-document file://cloudwatch-policy.json) check_error "$POLICY_OUTPUT" "create-policy" POLICY_ARN=$(echo "$POLICY_OUTPUT" | grep -o '"Arn": "[^"]*' | cut -d'"' -f4) echo "CloudWatch policy ARN: $POLICY_ARN" ATTACH_OUTPUT=$(aws iam attach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "$POLICY_ARN") check_error "$ATTACH_OUTPUT" "attach-role-policy" echo "CloudWatch policy attached to role" # Step 3: Create the Grafana workspace echo "Creating Amazon Managed Grafana workspace..." WORKSPACE_OUTPUT=$(aws grafana create-workspace \ --workspace-name "$WORKSPACE_NAME" \ --authentication-providers "SAML" \ --permission-type "CUSTOMER_MANAGED" \ --account-access-type "CURRENT_ACCOUNT" \ --workspace-role-arn "$ROLE_ARN" \ --workspace-data-sources "CLOUDWATCH" "PROMETHEUS" "XRAY" \ --grafana-version "10.4" \ --tags Environment=Development) check_error "$WORKSPACE_OUTPUT" "create-workspace" echo "Workspace creation initiated:" echo "$WORKSPACE_OUTPUT" # Extract workspace ID WORKSPACE_ID=$(echo "$WORKSPACE_OUTPUT" | grep -o '"id": "[^"]*' | cut -d'"' -f4) if [ -z "$WORKSPACE_ID" ]; then echo "ERROR: Failed to extract workspace ID from output" exit 1 fi echo "Workspace ID: $WORKSPACE_ID" # Step 4: Wait for workspace to become active echo "Waiting for workspace to become active. This may take several minutes..." ACTIVE=false MAX_ATTEMPTS=30 ATTEMPT=0 while [ $ACTIVE = false ] && [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do ATTEMPT=$((ATTEMPT+1)) echo "Checking workspace status (attempt $ATTEMPT of $MAX_ATTEMPTS)..." DESCRIBE_OUTPUT=$(aws grafana describe-workspace --workspace-id "$WORKSPACE_ID") check_error "$DESCRIBE_OUTPUT" "describe-workspace" STATUS=$(echo "$DESCRIBE_OUTPUT" | grep -o '"status": "[^"]*' | cut -d'"' -f4) echo "Current status: $STATUS" if [ "$STATUS" = "ACTIVE" ]; then ACTIVE=true echo "Workspace is now ACTIVE" elif [ "$STATUS" = "FAILED" ]; then echo "ERROR: Workspace creation failed" cleanup_on_error exit 1 else echo "Workspace is still being created. Waiting 30 seconds..." sleep 30 fi done if [ $ACTIVE = false ]; then echo "ERROR: Workspace did not become active within the expected time" cleanup_on_error exit 1 fi # Extract workspace endpoint URL WORKSPACE_URL=$(echo "$DESCRIBE_OUTPUT" | grep -o '"endpoint": "[^"]*' | cut -d'"' -f4) echo "Workspace URL: https://$WORKSPACE_URL" # Step 5: Display workspace information echo "" echo "===========================================" echo "WORKSPACE INFORMATION" echo "===========================================" echo "Workspace ID: $WORKSPACE_ID" echo "Workspace URL: https://$WORKSPACE_URL" echo "Workspace Name: $WORKSPACE_NAME" echo "IAM Role: $ROLE_NAME" echo "" echo "Note: Since SAML authentication is used, you need to configure SAML settings" echo "using the AWS Management Console or the update-workspace-authentication command." echo "===========================================" # Step 6: Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Resources created:" echo "- Amazon Managed Grafana workspace: $WORKSPACE_ID" echo "- IAM Role: $ROLE_NAME" echo "- IAM Policy: GrafanaCloudWatchPolicy-${RANDOM_ID}" echo "" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy] ]]; then echo "Cleaning up resources..." echo "Deleting workspace $WORKSPACE_ID..." DELETE_OUTPUT=$(aws grafana delete-workspace --workspace-id "$WORKSPACE_ID") check_error "$DELETE_OUTPUT" "delete-workspace" echo "Waiting for workspace to be deleted..." DELETED=false ATTEMPT=0 while [ $DELETED = false ] && [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do ATTEMPT=$((ATTEMPT+1)) echo "Checking deletion status (attempt $ATTEMPT of $MAX_ATTEMPTS)..." if aws grafana describe-workspace --workspace-id "$WORKSPACE_ID" 2>&1 | grep -i "not found\|does not exist" > /dev/null; then DELETED=true echo "Workspace has been deleted" else echo "Workspace is still being deleted. Waiting 30 seconds..." sleep 30 fi done if [ $DELETED = false ]; then echo "WARNING: Workspace deletion is taking longer than expected. It may still be in progress." fi # Detach policy from role echo "Detaching policy from role..." aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "$POLICY_ARN" # Delete policy echo "Deleting IAM policy..." aws iam delete-policy \ --policy-arn "$POLICY_ARN" # Delete role echo "Deleting IAM role..." aws iam delete-role \ --role-name "$ROLE_NAME" # Clean up JSON files rm -f trust-policy.json cloudwatch-policy.json echo "Cleanup completed" else echo "Skipping cleanup. Resources will remain in your AWS account." fi echo "Script completed at $(date)"

The following code example shows how to:

  • Create a Docker image

  • Create an Amazon ECR repository

  • Delete resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon ECR Getting Started Script # This script demonstrates the lifecycle of a Docker image in Amazon ECR # Set up logging LOG_FILE="ecr-tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "===================================================" echo "Amazon ECR Getting Started Tutorial" echo "===================================================" echo "This script will:" echo "1. Create a Docker image" echo "2. Create an Amazon ECR repository" echo "3. Authenticate to Amazon ECR" echo "4. Push the image to Amazon ECR" echo "5. Pull the image from Amazon ECR" echo "6. Clean up resources (optional)" echo "===================================================" # Check prerequisites echo "Checking prerequisites..." # Check if AWS CLI is installed if ! command -v aws &> /dev/null; then echo "ERROR: AWS CLI is not installed. Please install it before running this script." echo "Visit https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html for installation instructions." exit 1 fi # Check if AWS CLI is configured if ! aws sts get-caller-identity &> /dev/null; then echo "ERROR: AWS CLI is not configured properly. Please run 'aws configure' to set up your credentials." exit 1 fi # Check if Docker is installed if ! command -v docker &> /dev/null; then echo "ERROR: Docker is not installed. Please install Docker before running this script." echo "Visit https://docs.docker.com/get-docker/ for installation instructions." exit 1 fi # Check if Docker daemon is running if ! docker info &> /dev/null; then echo "ERROR: Docker daemon is not running. Please start Docker and try again." exit 1 fi echo "All prerequisites met." # Initialize variables REPO_URI="" TIMEOUT_CMD="timeout 300" # 5-minute timeout for long-running commands # Function to handle errors handle_error() { echo "ERROR: $1" echo "Check the log file for details: $LOG_FILE" echo "===================================================" echo "Resources created:" echo "- Docker image: hello-world (local)" if [ -n "$REPO_URI" ]; then echo "- ECR Repository: hello-repository" echo "- ECR Image: $REPO_URI:latest" fi echo "===================================================" echo "Attempting to clean up resources..." cleanup exit 1 } # Function to clean up resources cleanup() { echo "===================================================" echo "Cleaning up resources..." # Delete the image from ECR if it exists if [ -n "$REPO_URI" ]; then echo "Deleting image from ECR repository..." aws ecr batch-delete-image --repository-name hello-repository --image-ids imageTag=latest || echo "Failed to delete image, it may not exist or may have already been deleted." fi # Delete the ECR repository if it exists if [ -n "$REPO_URI" ]; then echo "Deleting ECR repository..." aws ecr delete-repository --repository-name hello-repository --force || echo "Failed to delete repository, it may not exist or may have already been deleted." fi # Remove local Docker image echo "Removing local Docker image..." docker rmi hello-world:latest 2>/dev/null || echo "Failed to remove local image, it may not exist or may have already been deleted." if [ -n "$REPO_URI" ]; then docker rmi "$REPO_URI:latest" 2>/dev/null || echo "Failed to remove tagged image, it may not exist or may have already been deleted." fi echo "Cleanup completed." echo "===================================================" } # Step 1: Create a Docker image echo "Step 1: Creating a Docker image" # Create Dockerfile echo "Creating Dockerfile..." cat > Dockerfile << 'EOF' FROM public.ecr.aws/amazonlinux/amazonlinux:latest # Install dependencies RUN yum update -y && \ yum install -y httpd # Install apache and write hello world message RUN echo 'Hello World!' > /var/www/html/index.html # Configure apache RUN echo 'mkdir -p /var/run/httpd' >> /root/run_apache.sh && \ echo 'mkdir -p /var/lock/httpd' >> /root/run_apache.sh && \ echo '/usr/sbin/httpd -D FOREGROUND' >> /root/run_apache.sh && \ chmod 755 /root/run_apache.sh EXPOSE 80 CMD /root/run_apache.sh EOF # Build Docker image echo "Building Docker image..." $TIMEOUT_CMD docker build -t hello-world . || handle_error "Failed to build Docker image or operation timed out after 5 minutes" # Verify image was created echo "Verifying Docker image..." docker images --filter reference=hello-world || handle_error "Failed to list Docker images" echo "Docker image created successfully." # Step 2: Create an Amazon ECR repository echo "Step 2: Creating an Amazon ECR repository" # Get AWS account ID echo "Getting AWS account ID..." AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) if [[ -z "$AWS_ACCOUNT_ID" || "$AWS_ACCOUNT_ID" == *"error"* ]]; then handle_error "Failed to get AWS account ID. Make sure your AWS credentials are configured correctly." fi echo "AWS Account ID: $AWS_ACCOUNT_ID" # Get current region AWS_REGION=$(aws configure get region) if [[ -z "$AWS_REGION" ]]; then AWS_REGION="us-east-1" # Default to us-east-1 if no region is configured echo "No AWS region configured, defaulting to $AWS_REGION" else echo "Using AWS region: $AWS_REGION" fi # Create ECR repository echo "Creating ECR repository..." REPO_RESULT=$(aws ecr create-repository --repository-name hello-repository) if [[ -z "$REPO_RESULT" || "$REPO_RESULT" == *"error"* ]]; then handle_error "Failed to create ECR repository" fi # Extract repository URI REPO_URI="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/hello-repository" echo "Repository URI: $REPO_URI" # Step 3: Authenticate to Amazon ECR echo "Step 3: Authenticating to Amazon ECR" echo "Getting ECR login password..." aws ecr get-login-password --region "$AWS_REGION" | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com" || handle_error "Failed to authenticate to ECR" echo "Successfully authenticated to ECR." # Step 4: Push the image to Amazon ECR echo "Step 4: Pushing the image to Amazon ECR" # Tag the image echo "Tagging Docker image..." docker tag hello-world:latest "$REPO_URI:latest" || handle_error "Failed to tag Docker image" # Push the image with timeout echo "Pushing image to ECR..." $TIMEOUT_CMD docker push "$REPO_URI:latest" || handle_error "Failed to push image to ECR or operation timed out after 5 minutes" echo "Successfully pushed image to ECR." # Step 5: Pull the image from Amazon ECR echo "Step 5: Pulling the image from Amazon ECR" # Remove local image to ensure we're pulling from ECR echo "Removing local tagged image..." docker rmi "$REPO_URI:latest" || echo "Warning: Failed to remove local tagged image" # Pull the image with timeout echo "Pulling image from ECR..." $TIMEOUT_CMD docker pull "$REPO_URI:latest" || handle_error "Failed to pull image from ECR or operation timed out after 5 minutes" echo "Successfully pulled image from ECR." # List resources created echo "===================================================" echo "Resources created:" echo "- Docker image: hello-world (local)" echo "- ECR Repository: hello-repository" echo "- ECR Image: $REPO_URI:latest" echo "===================================================" # Ask user if they want to clean up resources echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then # Step 6: Delete the image from ECR echo "Step 6: Deleting the image from ECR" DELETE_IMAGE_RESULT=$(aws ecr batch-delete-image --repository-name hello-repository --image-ids imageTag=latest) if [[ -z "$DELETE_IMAGE_RESULT" || "$DELETE_IMAGE_RESULT" == *"error"* ]]; then echo "Warning: Failed to delete image from ECR" else echo "Successfully deleted image from ECR." fi # Step 7: Delete the ECR repository echo "Step 7: Deleting the ECR repository" DELETE_REPO_RESULT=$(aws ecr delete-repository --repository-name hello-repository --force) if [[ -z "$DELETE_REPO_RESULT" || "$DELETE_REPO_RESULT" == *"error"* ]]; then echo "Warning: Failed to delete ECR repository" else echo "Successfully deleted ECR repository." fi # Remove local Docker images echo "Removing local Docker images..." docker rmi hello-world:latest 2>/dev/null || echo "Warning: Failed to remove local image" echo "All resources have been cleaned up." else echo "Resources were not cleaned up. You can manually clean up later with:" echo "aws ecr batch-delete-image --repository-name hello-repository --image-ids imageTag=latest" echo "aws ecr delete-repository --repository-name hello-repository --force" echo "docker rmi hello-world:latest" echo "docker rmi $REPO_URI:latest" fi echo "===================================================" echo "Tutorial completed!" echo "Log file: $LOG_FILE" echo "==================================================="

The following code example shows how to:

  • Create a VPC for your EKS cluster

  • Create IAM roles for your EKS cluster

  • Create your EKS cluster

  • Configure kubectl to communicate with your cluster

  • Create a managed node group

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon EKS Cluster Creation Script (v2) # This script creates an Amazon EKS cluster with a managed node group using the AWS CLI # Set up logging LOG_FILE="eks-cluster-creation-v2.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon EKS cluster creation script at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" # Error handling function handle_error() { echo "ERROR: $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to check command success check_command() { if [ $? -ne 0 ] || echo "$1" | grep -i "error" > /dev/null; then handle_error "$1" fi } # Function to check if kubectl is installed check_kubectl() { if ! command -v kubectl &> /dev/null; then echo "WARNING: kubectl is not installed or not in your PATH." echo "" echo "To install kubectl, follow these instructions based on your operating system:" echo "" echo "For Linux:" echo " 1. Download the latest release:" echo " curl -LO \"https://dl.k8s.io/release/\$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl\"" echo "" echo " 2. Make the kubectl binary executable:" echo " chmod +x ./kubectl" echo "" echo " 3. Move the binary to your PATH:" echo " sudo mv ./kubectl /usr/local/bin/kubectl" echo "" echo "For macOS:" echo " 1. Using Homebrew:" echo " brew install kubectl" echo " or" echo " 2. Using curl:" echo " curl -LO \"https://dl.k8s.io/release/\$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/amd64/kubectl\"" echo " chmod +x ./kubectl" echo " sudo mv ./kubectl /usr/local/bin/kubectl" echo "" echo "For Windows:" echo " 1. Using curl:" echo " curl -LO \"https://dl.k8s.io/release/v1.28.0/bin/windows/amd64/kubectl.exe\"" echo " Add the binary to your PATH" echo " or" echo " 2. Using Chocolatey:" echo " choco install kubernetes-cli" echo "" echo "After installation, verify with: kubectl version --client" echo "" return 1 fi return 0 } # Generate a random identifier for resource names RANDOM_ID=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | fold -w 6 | head -n 1) STACK_NAME="eks-vpc-stack-${RANDOM_ID}" CLUSTER_NAME="eks-cluster-${RANDOM_ID}" NODEGROUP_NAME="eks-nodegroup-${RANDOM_ID}" CLUSTER_ROLE_NAME="EKSClusterRole-${RANDOM_ID}" NODE_ROLE_NAME="EKSNodeRole-${RANDOM_ID}" echo "Using the following resource names:" echo "- VPC Stack: $STACK_NAME" echo "- EKS Cluster: $CLUSTER_NAME" echo "- Node Group: $NODEGROUP_NAME" echo "- Cluster IAM Role: $CLUSTER_ROLE_NAME" echo "- Node IAM Role: $NODE_ROLE_NAME" # Array to track created resources for cleanup declare -a CREATED_RESOURCES # Function to clean up resources cleanup_resources() { echo "Cleaning up resources in reverse order..." # Check if node group exists and delete it if aws eks list-nodegroups --cluster-name "$CLUSTER_NAME" --query "nodegroups[?contains(@,'$NODEGROUP_NAME')]" --output text 2>/dev/null | grep -q "$NODEGROUP_NAME"; then echo "Deleting node group: $NODEGROUP_NAME" aws eks delete-nodegroup --cluster-name "$CLUSTER_NAME" --nodegroup-name "$NODEGROUP_NAME" echo "Waiting for node group deletion to complete..." aws eks wait nodegroup-deleted --cluster-name "$CLUSTER_NAME" --nodegroup-name "$NODEGROUP_NAME" echo "Node group deleted successfully." fi # Check if cluster exists and delete it if aws eks describe-cluster --name "$CLUSTER_NAME" 2>/dev/null; then echo "Deleting cluster: $CLUSTER_NAME" aws eks delete-cluster --name "$CLUSTER_NAME" echo "Waiting for cluster deletion to complete (this may take several minutes)..." aws eks wait cluster-deleted --name "$CLUSTER_NAME" echo "Cluster deleted successfully." fi # Check if CloudFormation stack exists and delete it if aws cloudformation describe-stacks --stack-name "$STACK_NAME" 2>/dev/null; then echo "Deleting CloudFormation stack: $STACK_NAME" aws cloudformation delete-stack --stack-name "$STACK_NAME" echo "Waiting for CloudFormation stack deletion to complete..." aws cloudformation wait stack-delete-complete --stack-name "$STACK_NAME" echo "CloudFormation stack deleted successfully." fi # Clean up IAM roles if aws iam get-role --role-name "$NODE_ROLE_NAME" 2>/dev/null; then echo "Detaching policies from node role: $NODE_ROLE_NAME" aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy --role-name "$NODE_ROLE_NAME" aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly --role-name "$NODE_ROLE_NAME" aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy --role-name "$NODE_ROLE_NAME" echo "Deleting node role: $NODE_ROLE_NAME" aws iam delete-role --role-name "$NODE_ROLE_NAME" echo "Node role deleted successfully." fi if aws iam get-role --role-name "$CLUSTER_ROLE_NAME" 2>/dev/null; then echo "Detaching policies from cluster role: $CLUSTER_ROLE_NAME" aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy --role-name "$CLUSTER_ROLE_NAME" echo "Deleting cluster role: $CLUSTER_ROLE_NAME" aws iam delete-role --role-name "$CLUSTER_ROLE_NAME" echo "Cluster role deleted successfully." fi echo "Cleanup complete." } # Trap to ensure cleanup on script exit trap 'echo "Script interrupted. Cleaning up resources..."; cleanup_resources; exit 1' SIGINT SIGTERM # Verify AWS CLI configuration echo "Verifying AWS CLI configuration..." AWS_ACCOUNT_INFO=$(aws sts get-caller-identity) check_command "$AWS_ACCOUNT_INFO" echo "AWS CLI is properly configured." # Step 1: Create VPC using CloudFormation echo "Step 1: Creating VPC with CloudFormation..." echo "Creating CloudFormation stack: $STACK_NAME" # Create the CloudFormation stack CF_CREATE_OUTPUT=$(aws cloudformation create-stack \ --stack-name "$STACK_NAME" \ --template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml) check_command "$CF_CREATE_OUTPUT" CREATED_RESOURCES+=("CloudFormation Stack: $STACK_NAME") echo "Waiting for CloudFormation stack to complete (this may take a few minutes)..." aws cloudformation wait stack-create-complete --stack-name "$STACK_NAME" if [ $? -ne 0 ]; then handle_error "CloudFormation stack creation failed" fi echo "CloudFormation stack created successfully." # Step 2: Create IAM roles for EKS echo "Step 2: Creating IAM roles for EKS..." # Create cluster role trust policy echo "Creating cluster role trust policy..." cat > eks-cluster-role-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "eks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create cluster role echo "Creating cluster IAM role: $CLUSTER_ROLE_NAME" CLUSTER_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$CLUSTER_ROLE_NAME" \ --assume-role-policy-document file://"eks-cluster-role-trust-policy.json") check_command "$CLUSTER_ROLE_OUTPUT" CREATED_RESOURCES+=("IAM Role: $CLUSTER_ROLE_NAME") # Attach policy to cluster role echo "Attaching EKS cluster policy to role..." ATTACH_CLUSTER_POLICY_OUTPUT=$(aws iam attach-role-policy \ --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \ --role-name "$CLUSTER_ROLE_NAME") check_command "$ATTACH_CLUSTER_POLICY_OUTPUT" # Create node role trust policy echo "Creating node role trust policy..." cat > node-role-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create node role echo "Creating node IAM role: $NODE_ROLE_NAME" NODE_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$NODE_ROLE_NAME" \ --assume-role-policy-document file://"node-role-trust-policy.json") check_command "$NODE_ROLE_OUTPUT" CREATED_RESOURCES+=("IAM Role: $NODE_ROLE_NAME") # Attach policies to node role echo "Attaching EKS node policies to role..." ATTACH_NODE_POLICY1_OUTPUT=$(aws iam attach-role-policy \ --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \ --role-name "$NODE_ROLE_NAME") check_command "$ATTACH_NODE_POLICY1_OUTPUT" ATTACH_NODE_POLICY2_OUTPUT=$(aws iam attach-role-policy \ --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \ --role-name "$NODE_ROLE_NAME") check_command "$ATTACH_NODE_POLICY2_OUTPUT" ATTACH_NODE_POLICY3_OUTPUT=$(aws iam attach-role-policy \ --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \ --role-name "$NODE_ROLE_NAME") check_command "$ATTACH_NODE_POLICY3_OUTPUT" # Step 3: Get VPC and subnet information echo "Step 3: Getting VPC and subnet information..." VPC_ID=$(aws cloudformation describe-stacks \ --stack-name "$STACK_NAME" \ --query "Stacks[0].Outputs[?OutputKey=='VpcId'].OutputValue" \ --output text) if [ -z "$VPC_ID" ]; then handle_error "Failed to get VPC ID from CloudFormation stack" fi echo "VPC ID: $VPC_ID" SUBNET_IDS=$(aws cloudformation describe-stacks \ --stack-name "$STACK_NAME" \ --query "Stacks[0].Outputs[?OutputKey=='SubnetIds'].OutputValue" \ --output text) if [ -z "$SUBNET_IDS" ]; then handle_error "Failed to get Subnet IDs from CloudFormation stack" fi echo "Subnet IDs: $SUBNET_IDS" SECURITY_GROUP_ID=$(aws cloudformation describe-stacks \ --stack-name "$STACK_NAME" \ --query "Stacks[0].Outputs[?OutputKey=='SecurityGroups'].OutputValue" \ --output text) if [ -z "$SECURITY_GROUP_ID" ]; then handle_error "Failed to get Security Group ID from CloudFormation stack" fi echo "Security Group ID: $SECURITY_GROUP_ID" # Step 4: Create EKS cluster echo "Step 4: Creating EKS cluster: $CLUSTER_NAME" CLUSTER_ROLE_ARN=$(aws iam get-role --role-name "$CLUSTER_ROLE_NAME" --query "Role.Arn" --output text) if [ -z "$CLUSTER_ROLE_ARN" ]; then handle_error "Failed to get Cluster Role ARN" fi echo "Creating EKS cluster (this will take 10-15 minutes)..." CREATE_CLUSTER_OUTPUT=$(aws eks create-cluster \ --name "$CLUSTER_NAME" \ --role-arn "$CLUSTER_ROLE_ARN" \ --resources-vpc-config subnetIds="$SUBNET_IDS",securityGroupIds="$SECURITY_GROUP_ID") check_command "$CREATE_CLUSTER_OUTPUT" CREATED_RESOURCES+=("EKS Cluster: $CLUSTER_NAME") echo "Waiting for EKS cluster to become active (this may take 10-15 minutes)..." aws eks wait cluster-active --name "$CLUSTER_NAME" if [ $? -ne 0 ]; then handle_error "Cluster creation failed or timed out" fi echo "EKS cluster is now active." # Step 5: Configure kubectl echo "Step 5: Configuring kubectl to communicate with the cluster..." # Check if kubectl is installed if ! check_kubectl; then echo "Will skip kubectl configuration steps but continue with the script." echo "You can manually configure kubectl later with: aws eks update-kubeconfig --name \"$CLUSTER_NAME\"" else UPDATE_KUBECONFIG_OUTPUT=$(aws eks update-kubeconfig --name "$CLUSTER_NAME") check_command "$UPDATE_KUBECONFIG_OUTPUT" echo "kubectl configured successfully." # Test kubectl configuration echo "Testing kubectl configuration..." KUBECTL_TEST_OUTPUT=$(kubectl get svc 2>&1) if [ $? -ne 0 ]; then echo "Warning: kubectl configuration test failed. This might be due to permissions or network issues." echo "Error details: $KUBECTL_TEST_OUTPUT" echo "Continuing with script execution..." else echo "$KUBECTL_TEST_OUTPUT" echo "kubectl configuration test successful." fi fi # Step 6: Create managed node group echo "Step 6: Creating managed node group: $NODEGROUP_NAME" NODE_ROLE_ARN=$(aws iam get-role --role-name "$NODE_ROLE_NAME" --query "Role.Arn" --output text) if [ -z "$NODE_ROLE_ARN" ]; then handle_error "Failed to get Node Role ARN" fi # Convert comma-separated subnet IDs to space-separated for the create-nodegroup command SUBNET_IDS_ARRAY=(${SUBNET_IDS//,/ }) echo "Creating managed node group (this will take 5-10 minutes)..." CREATE_NODEGROUP_OUTPUT=$(aws eks create-nodegroup \ --cluster-name "$CLUSTER_NAME" \ --nodegroup-name "$NODEGROUP_NAME" \ --node-role "$NODE_ROLE_ARN" \ --subnets "${SUBNET_IDS_ARRAY[@]}") check_command "$CREATE_NODEGROUP_OUTPUT" CREATED_RESOURCES+=("EKS Node Group: $NODEGROUP_NAME") echo "Waiting for node group to become active (this may take 5-10 minutes)..." aws eks wait nodegroup-active --cluster-name "$CLUSTER_NAME" --nodegroup-name "$NODEGROUP_NAME" if [ $? -ne 0 ]; then handle_error "Node group creation failed or timed out" fi echo "Node group is now active." # Step 7: Verify nodes echo "Step 7: Verifying nodes..." echo "Waiting for nodes to register with the cluster (this may take a few minutes)..." sleep 60 # Give nodes more time to register # Check if kubectl is installed before attempting to use it if ! check_kubectl; then echo "Cannot verify nodes without kubectl. Skipping this step." echo "You can manually verify nodes after installing kubectl with: kubectl get nodes" else NODES_OUTPUT=$(kubectl get nodes 2>&1) if [ $? -ne 0 ]; then echo "Warning: Unable to get nodes. This might be due to permissions or the nodes are still registering." echo "Error details: $NODES_OUTPUT" echo "Continuing with script execution..." else echo "$NODES_OUTPUT" echo "Nodes verified successfully." fi fi # Step 8: View resources echo "Step 8: Viewing cluster resources..." echo "Cluster information:" CLUSTER_INFO=$(aws eks describe-cluster --name "$CLUSTER_NAME") echo "$CLUSTER_INFO" echo "Node group information:" NODEGROUP_INFO=$(aws eks describe-nodegroup --cluster-name "$CLUSTER_NAME" --nodegroup-name "$NODEGROUP_NAME") echo "$NODEGROUP_INFO" echo "Kubernetes resources:" if ! check_kubectl; then echo "Cannot list Kubernetes resources without kubectl. Skipping this step." echo "You can manually list resources after installing kubectl with: kubectl get all --all-namespaces" else KUBE_RESOURCES=$(kubectl get all --all-namespaces 2>&1) if [ $? -ne 0 ]; then echo "Warning: Unable to get Kubernetes resources. This might be due to permissions." echo "Error details: $KUBE_RESOURCES" echo "Continuing with script execution..." else echo "$KUBE_RESOURCES" fi fi # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "- $resource" done echo "===========================================" # Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then cleanup_resources else echo "Resources will not be cleaned up. You can manually clean them up later." echo "To clean up resources, run the following commands:" echo "1. Delete node group: aws eks delete-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name $NODEGROUP_NAME" echo "2. Wait for node group deletion: aws eks wait nodegroup-deleted --cluster-name $CLUSTER_NAME --nodegroup-name $NODEGROUP_NAME" echo "3. Delete cluster: aws eks delete-cluster --name $CLUSTER_NAME" echo "4. Wait for cluster deletion: aws eks wait cluster-deleted --name $CLUSTER_NAME" echo "5. Delete CloudFormation stack: aws cloudformation delete-stack --stack-name $STACK_NAME" echo "6. Detach and delete IAM roles for the node group and cluster" fi echo "Script completed at $(date)"

The following code example shows how to:

  • Create an MSK cluster

  • Create IAM permissions for MSK access

  • Create a client machine

  • Get bootstrap brokers

  • Set up the client machine

  • Create a topic and produce/consume data

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon MSK Getting Started Tutorial Script - Version 8 # This script automates the steps in the Amazon MSK Getting Started tutorial # It creates an MSK cluster, sets up IAM permissions, creates a client machine, # and configures the client to interact with the cluster # Set up logging LOG_FILE="msk_tutorial_$(date +%Y%m%d_%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon MSK Getting Started Tutorial Script - Version 8" echo "Logging to $LOG_FILE" echo "==============================================" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created so far:" if [ -n "$CLUSTER_ARN" ]; then echo "- MSK Cluster: $CLUSTER_ARN"; fi if [ -n "$POLICY_ARN" ]; then echo "- IAM Policy: $POLICY_ARN"; fi if [ -n "$ROLE_NAME" ]; then echo "- IAM Role: $ROLE_NAME"; fi if [ -n "$INSTANCE_PROFILE_NAME" ]; then echo "- IAM Instance Profile: $INSTANCE_PROFILE_NAME"; fi if [ -n "$CLIENT_SG_ID" ]; then echo "- Client Security Group: $CLIENT_SG_ID"; fi if [ -n "$INSTANCE_ID" ]; then echo "- EC2 Instance: $INSTANCE_ID"; fi if [ -n "$KEY_NAME" ]; then echo "- Key Pair: $KEY_NAME"; fi echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to check if a resource exists resource_exists() { local resource_type="$1" local resource_id="$2" case "$resource_type" in "cluster") aws kafka describe-cluster --cluster-arn "$resource_id" &>/dev/null ;; "policy") aws iam get-policy --policy-arn "$resource_id" &>/dev/null ;; "role") aws iam get-role --role-name "$resource_id" &>/dev/null ;; "instance-profile") aws iam get-instance-profile --instance-profile-name "$resource_id" &>/dev/null ;; "security-group") aws ec2 describe-security-groups --group-ids "$resource_id" &>/dev/null ;; "instance") aws ec2 describe-instances --instance-ids "$resource_id" --query 'Reservations[0].Instances[0].State.Name' --output text | grep -v "terminated" &>/dev/null ;; "key-pair") aws ec2 describe-key-pairs --key-names "$resource_id" &>/dev/null ;; esac } # Function to remove security group references remove_security_group_references() { local sg_id="$1" if [ -z "$sg_id" ]; then echo "No security group ID provided for reference removal" return fi echo "Removing security group references for $sg_id" # Get all security groups in the VPC that might reference our client security group local vpc_security_groups=$(aws ec2 describe-security-groups \ --filters "Name=vpc-id,Values=$DEFAULT_VPC_ID" \ --query 'SecurityGroups[].GroupId' \ --output text 2>/dev/null) if [ -n "$vpc_security_groups" ]; then for other_sg in $vpc_security_groups; do if [ "$other_sg" != "$sg_id" ]; then echo "Checking security group $other_sg for references to $sg_id" # Get the security group details in JSON format local sg_details=$(aws ec2 describe-security-groups \ --group-ids "$other_sg" \ --output json 2>/dev/null) if [ -n "$sg_details" ]; then # Check if our security group is referenced in inbound rules local has_inbound_ref=$(echo "$sg_details" | grep -o "\"GroupId\": \"$sg_id\"" | head -1) if [ -n "$has_inbound_ref" ]; then echo "Found inbound rules in $other_sg referencing $sg_id, removing them..." # Try to remove common rule types echo "Attempting to remove all-traffic rule" aws ec2 revoke-security-group-ingress \ --group-id "$other_sg" \ --protocol all \ --source-group "$sg_id" 2>/dev/null || echo "No all-traffic rule to remove" # Try to remove TCP rules on common ports for port in 22 80 443 9092 9094 9096; do aws ec2 revoke-security-group-ingress \ --group-id "$other_sg" \ --protocol tcp \ --port "$port" \ --source-group "$sg_id" 2>/dev/null || true done # Try to remove UDP rules aws ec2 revoke-security-group-ingress \ --group-id "$other_sg" \ --protocol udp \ --source-group "$sg_id" 2>/dev/null || true fi # Check for outbound rules (less common but possible) local has_outbound_ref=$(echo "$sg_details" | grep -A 20 "IpPermissionsEgress" | grep -o "\"GroupId\": \"$sg_id\"" | head -1) if [ -n "$has_outbound_ref" ]; then echo "Found outbound rules in $other_sg referencing $sg_id, removing them..." aws ec2 revoke-security-group-egress \ --group-id "$other_sg" \ --protocol all \ --source-group "$sg_id" 2>/dev/null || echo "No outbound all-traffic rule to remove" fi fi fi done fi echo "Completed security group reference removal for $sg_id" } # Function to clean up resources cleanup_resources() { echo "Cleaning up resources..." # Delete EC2 instance if it exists if [ -n "$INSTANCE_ID" ] && resource_exists "instance" "$INSTANCE_ID"; then echo "Terminating EC2 instance: $INSTANCE_ID" aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" || echo "Failed to terminate instance" echo "Waiting for instance to terminate..." aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID" || echo "Failed to wait for instance termination" fi # Delete MSK cluster first (to remove dependencies on security group) if [ -n "$CLUSTER_ARN" ] && resource_exists "cluster" "$CLUSTER_ARN"; then echo "Deleting MSK cluster: $CLUSTER_ARN" aws kafka delete-cluster --cluster-arn "$CLUSTER_ARN" || echo "Failed to delete cluster" # Wait a bit for the cluster deletion to start echo "Waiting 30 seconds for cluster deletion to begin..." sleep 30 fi # Remove security group references before attempting deletion if [ -n "$CLIENT_SG_ID" ] && resource_exists "security-group" "$CLIENT_SG_ID"; then remove_security_group_references "$CLIENT_SG_ID" echo "Deleting security group: $CLIENT_SG_ID" # Try multiple times with longer delays to ensure dependencies are removed for i in {1..10}; do if aws ec2 delete-security-group --group-id "$CLIENT_SG_ID"; then echo "Security group deleted successfully" break fi echo "Failed to delete security group (attempt $i/10), retrying in 30 seconds..." sleep 30 done fi # Delete key pair if it exists if [ -n "$KEY_NAME" ] && resource_exists "key-pair" "$KEY_NAME"; then echo "Deleting key pair: $KEY_NAME" aws ec2 delete-key-pair --key-name "$KEY_NAME" || echo "Failed to delete key pair" fi # Remove role from instance profile if [ -n "$ROLE_NAME" ] && [ -n "$INSTANCE_PROFILE_NAME" ] && resource_exists "instance-profile" "$INSTANCE_PROFILE_NAME"; then echo "Removing role from instance profile" aws iam remove-role-from-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$ROLE_NAME" || echo "Failed to remove role from instance profile" fi # Delete instance profile if [ -n "$INSTANCE_PROFILE_NAME" ] && resource_exists "instance-profile" "$INSTANCE_PROFILE_NAME"; then echo "Deleting instance profile: $INSTANCE_PROFILE_NAME" aws iam delete-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" || echo "Failed to delete instance profile" fi # Detach policy from role if [ -n "$ROLE_NAME" ] && [ -n "$POLICY_ARN" ] && resource_exists "role" "$ROLE_NAME"; then echo "Detaching policy from role" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "$POLICY_ARN" || echo "Failed to detach policy" fi # Delete role if [ -n "$ROLE_NAME" ] && resource_exists "role" "$ROLE_NAME"; then echo "Deleting role: $ROLE_NAME" aws iam delete-role --role-name "$ROLE_NAME" || echo "Failed to delete role" fi # Delete policy if [ -n "$POLICY_ARN" ] && resource_exists "policy" "$POLICY_ARN"; then echo "Deleting policy: $POLICY_ARN" aws iam delete-policy --policy-arn "$POLICY_ARN" || echo "Failed to delete policy" fi echo "Cleanup completed" } # Function to find a suitable subnet and instance type combination find_suitable_subnet_and_instance_type() { local vpc_id="$1" local -a subnet_array=("${!2}") # List of instance types to try, in order of preference local instance_types=("t3.micro" "t2.micro" "t3.small" "t2.small") echo "Finding suitable subnet and instance type combination..." for instance_type in "${instance_types[@]}"; do echo "Trying instance type: $instance_type" for subnet_id in "${subnet_array[@]}"; do # Get the availability zone for this subnet local az=$(aws ec2 describe-subnets \ --subnet-ids "$subnet_id" \ --query 'Subnets[0].AvailabilityZone' \ --output text) echo " Checking subnet $subnet_id in AZ $az" # Check if this instance type is available in this AZ local available=$(aws ec2 describe-instance-type-offerings \ --location-type availability-zone \ --filters "Name=location,Values=$az" "Name=instance-type,Values=$instance_type" \ --query 'InstanceTypeOfferings[0].InstanceType' \ --output text 2>/dev/null) if [ "$available" = "$instance_type" ]; then echo " ✓ Found suitable combination: $instance_type in $az (subnet: $subnet_id)" SELECTED_SUBNET_ID="$subnet_id" SELECTED_INSTANCE_TYPE="$instance_type" return 0 else echo " ✗ $instance_type not available in $az" fi done done echo "ERROR: Could not find any suitable subnet and instance type combination" return 1 } # Generate unique identifiers RANDOM_SUFFIX=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | fold -w 8 | head -n 1) CLUSTER_NAME="MSKTutorialCluster-${RANDOM_SUFFIX}" POLICY_NAME="msk-tutorial-policy-${RANDOM_SUFFIX}" ROLE_NAME="msk-tutorial-role-${RANDOM_SUFFIX}" INSTANCE_PROFILE_NAME="msk-tutorial-profile-${RANDOM_SUFFIX}" SG_NAME="MSKClientSecurityGroup-${RANDOM_SUFFIX}" echo "Using the following resource names:" echo "- Cluster Name: $CLUSTER_NAME" echo "- Policy Name: $POLICY_NAME" echo "- Role Name: $ROLE_NAME" echo "- Instance Profile Name: $INSTANCE_PROFILE_NAME" echo "- Security Group Name: $SG_NAME" echo "==============================================" # Step 1: Create an MSK Provisioned cluster echo "Step 1: Creating MSK Provisioned cluster" # Get the default VPC ID first echo "Getting default VPC..." DEFAULT_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=is-default,Values=true" \ --query "Vpcs[0].VpcId" \ --output text) if [ -z "$DEFAULT_VPC_ID" ] || [ "$DEFAULT_VPC_ID" = "None" ]; then handle_error "Could not find default VPC. Please ensure you have a default VPC in your region." fi echo "Default VPC ID: $DEFAULT_VPC_ID" # Get available subnets in the default VPC echo "Getting available subnets in the default VPC..." SUBNETS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$DEFAULT_VPC_ID" "Name=default-for-az,Values=true" \ --query "Subnets[0:3].SubnetId" \ --output text) # Convert space-separated subnet IDs to an array read -r -a SUBNET_ARRAY <<< "$SUBNETS" if [ ${#SUBNET_ARRAY[@]} -lt 3 ]; then handle_error "Not enough subnets available in the default VPC. Need at least 3 subnets, found ${#SUBNET_ARRAY[@]}." fi # Get default security group for the default VPC echo "Getting default security group for the default VPC..." DEFAULT_SG=$(aws ec2 describe-security-groups \ --filters "Name=group-name,Values=default" "Name=vpc-id,Values=$DEFAULT_VPC_ID" \ --query "SecurityGroups[0].GroupId" \ --output text) if [ -z "$DEFAULT_SG" ] || [ "$DEFAULT_SG" = "None" ]; then handle_error "Could not find default security group for VPC $DEFAULT_VPC_ID" fi echo "Creating MSK cluster: $CLUSTER_NAME" echo "Using VPC: $DEFAULT_VPC_ID" echo "Using subnets: ${SUBNET_ARRAY[0]} ${SUBNET_ARRAY[1]} ${SUBNET_ARRAY[2]}" echo "Using security group: $DEFAULT_SG" # Create the MSK cluster with proper error handling CLUSTER_RESPONSE=$(aws kafka create-cluster \ --cluster-name "$CLUSTER_NAME" \ --broker-node-group-info "{\"InstanceType\": \"kafka.t3.small\", \"ClientSubnets\": [\"${SUBNET_ARRAY[0]}\", \"${SUBNET_ARRAY[1]}\", \"${SUBNET_ARRAY[2]}\"], \"SecurityGroups\": [\"$DEFAULT_SG\"]}" \ --kafka-version "3.6.0" \ --number-of-broker-nodes 3 \ --encryption-info "{\"EncryptionInTransit\": {\"InCluster\": true, \"ClientBroker\": \"TLS\"}}" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create MSK cluster: $CLUSTER_RESPONSE" fi # Extract the cluster ARN using grep CLUSTER_ARN=$(echo "$CLUSTER_RESPONSE" | grep -o '"ClusterArn": "[^"]*' | cut -d'"' -f4) if [ -z "$CLUSTER_ARN" ]; then handle_error "Failed to extract cluster ARN from response: $CLUSTER_RESPONSE" fi echo "MSK cluster creation initiated. ARN: $CLUSTER_ARN" echo "Waiting for cluster to become active (this may take 15-20 minutes)..." # Wait for the cluster to become active while true; do CLUSTER_STATUS=$(aws kafka describe-cluster --cluster-arn "$CLUSTER_ARN" --query "ClusterInfo.State" --output text 2>/dev/null) if [ $? -ne 0 ]; then echo "Failed to get cluster status. Retrying in 30 seconds..." sleep 30 continue fi echo "Current cluster status: $CLUSTER_STATUS" if [ "$CLUSTER_STATUS" = "ACTIVE" ]; then echo "Cluster is now active!" break elif [ "$CLUSTER_STATUS" = "FAILED" ]; then handle_error "Cluster creation failed" fi echo "Still waiting for cluster to become active... (checking again in 60 seconds)" sleep 60 done echo "==============================================" # Step 2: Create an IAM role granting access to create topics on the Amazon MSK cluster echo "Step 2: Creating IAM policy and role" # Get account ID and region ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION=$(aws ec2 describe-availability-zones --query 'AvailabilityZones[0].RegionName' --output text) fi if [ -z "$ACCOUNT_ID" ] || [ -z "$REGION" ]; then handle_error "Could not determine AWS account ID or region" fi echo "Account ID: $ACCOUNT_ID" echo "Region: $REGION" # Create IAM policy echo "Creating IAM policy: $POLICY_NAME" POLICY_DOCUMENT="{ \"Version\": \"2012-10-17\", \"Statement\": [ { \"Effect\": \"Allow\", \"Action\": [ \"kafka-cluster:Connect\", \"kafka-cluster:AlterCluster\", \"kafka-cluster:DescribeCluster\" ], \"Resource\": [ \"$CLUSTER_ARN\" ] }, { \"Effect\": \"Allow\", \"Action\": [ \"kafka-cluster:*Topic*\", \"kafka-cluster:WriteData\", \"kafka-cluster:ReadData\" ], \"Resource\": [ \"arn:aws:kafka:$REGION:$ACCOUNT_ID:topic/$CLUSTER_NAME/*\" ] }, { \"Effect\": \"Allow\", \"Action\": [ \"kafka-cluster:AlterGroup\", \"kafka-cluster:DescribeGroup\" ], \"Resource\": [ \"arn:aws:kafka:$REGION:$ACCOUNT_ID:group/$CLUSTER_NAME/*\" ] } ] }" POLICY_RESPONSE=$(aws iam create-policy \ --policy-name "$POLICY_NAME" \ --policy-document "$POLICY_DOCUMENT" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create IAM policy: $POLICY_RESPONSE" fi # Extract the policy ARN using grep POLICY_ARN=$(echo "$POLICY_RESPONSE" | grep -o '"Arn": "[^"]*' | cut -d'"' -f4) if [ -z "$POLICY_ARN" ]; then handle_error "Failed to extract policy ARN from response: $POLICY_RESPONSE" fi echo "IAM policy created. ARN: $POLICY_ARN" # Create IAM role for EC2 echo "Creating IAM role: $ROLE_NAME" TRUST_POLICY="{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"ec2.amazonaws.com\"},\"Action\":\"sts:AssumeRole\"}]}" ROLE_RESPONSE=$(aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document "$TRUST_POLICY" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create IAM role: $ROLE_RESPONSE" fi echo "IAM role created: $ROLE_NAME" # Attach policy to role echo "Attaching policy to role" ATTACH_RESPONSE=$(aws iam attach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "$POLICY_ARN" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to attach policy to role: $ATTACH_RESPONSE" fi echo "Policy attached to role" # Create instance profile and add role to it echo "Creating instance profile: $INSTANCE_PROFILE_NAME" PROFILE_RESPONSE=$(aws iam create-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create instance profile: $PROFILE_RESPONSE" fi echo "Instance profile created" echo "Adding role to instance profile" ADD_ROLE_RESPONSE=$(aws iam add-role-to-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$ROLE_NAME" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to add role to instance profile: $ADD_ROLE_RESPONSE" fi echo "Role added to instance profile" # Wait a moment for IAM propagation echo "Waiting 10 seconds for IAM propagation..." sleep 10 echo "==============================================" # Step 3: Create a client machine echo "Step 3: Creating client machine" # Find a suitable subnet and instance type combination if ! find_suitable_subnet_and_instance_type "$DEFAULT_VPC_ID" SUBNET_ARRAY[@]; then handle_error "Could not find a suitable subnet and instance type combination" fi echo "Selected subnet: $SELECTED_SUBNET_ID" echo "Selected instance type: $SELECTED_INSTANCE_TYPE" # Verify the subnet is in the same VPC we're using SUBNET_VPC_ID=$(aws ec2 describe-subnets \ --subnet-ids "$SELECTED_SUBNET_ID" \ --query 'Subnets[0].VpcId' \ --output text) if [ "$SUBNET_VPC_ID" != "$DEFAULT_VPC_ID" ]; then handle_error "Subnet VPC ($SUBNET_VPC_ID) does not match default VPC ($DEFAULT_VPC_ID)" fi echo "VPC ID: $SUBNET_VPC_ID" # Get security group ID from the MSK cluster echo "Getting security group ID from the MSK cluster" MSK_SG_ID=$(aws kafka describe-cluster \ --cluster-arn "$CLUSTER_ARN" \ --query 'ClusterInfo.BrokerNodeGroupInfo.SecurityGroups[0]' \ --output text) if [ -z "$MSK_SG_ID" ] || [ "$MSK_SG_ID" = "None" ]; then handle_error "Failed to get security group ID from cluster" fi echo "MSK security group ID: $MSK_SG_ID" # Create security group for client echo "Creating security group for client: $SG_NAME" CLIENT_SG_RESPONSE=$(aws ec2 create-security-group \ --group-name "$SG_NAME" \ --description "Security group for MSK client" \ --vpc-id "$DEFAULT_VPC_ID" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create security group: $CLIENT_SG_RESPONSE" fi # Extract the security group ID using grep CLIENT_SG_ID=$(echo "$CLIENT_SG_RESPONSE" | grep -o '"GroupId": "[^"]*' | cut -d'"' -f4) if [ -z "$CLIENT_SG_ID" ]; then handle_error "Failed to extract security group ID from response: $CLIENT_SG_RESPONSE" fi echo "Client security group created. ID: $CLIENT_SG_ID" # Allow SSH access to client from your IP only echo "Getting your public IP address" MY_IP=$(curl -s https://checkip.amazonaws.com 2>/dev/null) if [ -z "$MY_IP" ]; then echo "Warning: Could not determine your IP address. Using 0.0.0.0/0 (not recommended for production)" MY_IP="0.0.0.0/0" else MY_IP="$MY_IP/32" echo "Your IP address: $MY_IP" fi echo "Adding SSH ingress rule to client security group" SSH_RULE_RESPONSE=$(aws ec2 authorize-security-group-ingress \ --group-id "$CLIENT_SG_ID" \ --protocol tcp \ --port 22 \ --cidr "$MY_IP" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then echo "Warning: Failed to add SSH ingress rule: $SSH_RULE_RESPONSE" echo "You may need to manually add SSH access to security group $CLIENT_SG_ID" fi echo "SSH ingress rule added" # Update MSK security group to allow traffic from client security group echo "Adding ingress rule to MSK security group to allow traffic from client" MSK_RULE_RESPONSE=$(aws ec2 authorize-security-group-ingress \ --group-id "$MSK_SG_ID" \ --protocol all \ --source-group "$CLIENT_SG_ID" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then echo "Warning: Failed to add ingress rule to MSK security group: $MSK_RULE_RESPONSE" echo "You may need to manually add ingress rule to security group $MSK_SG_ID" fi echo "Ingress rule added to MSK security group" # Create key pair KEY_NAME="MSKKeyPair-${RANDOM_SUFFIX}" echo "Creating key pair: $KEY_NAME" KEY_RESPONSE=$(aws ec2 create-key-pair --key-name "$KEY_NAME" --query 'KeyMaterial' --output text 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to create key pair: $KEY_RESPONSE" fi # Save the private key to a file KEY_FILE="${KEY_NAME}.pem" echo "$KEY_RESPONSE" > "$KEY_FILE" chmod 400 "$KEY_FILE" echo "Key pair created and saved to $KEY_FILE" # Get the latest Amazon Linux 2 AMI echo "Getting latest Amazon Linux 2 AMI ID" AMI_ID=$(aws ec2 describe-images \ --owners amazon \ --filters "Name=name,Values=amzn2-ami-hvm-*-x86_64-gp2" "Name=state,Values=available" \ --query "sort_by(Images, &CreationDate)[-1].ImageId" \ --output text 2>/dev/null) if [ -z "$AMI_ID" ] || [ "$AMI_ID" = "None" ]; then handle_error "Failed to get Amazon Linux 2 AMI ID" fi echo "Using AMI ID: $AMI_ID" # Launch EC2 instance with the selected subnet and instance type echo "Launching EC2 instance" echo "Instance type: $SELECTED_INSTANCE_TYPE" echo "Subnet: $SELECTED_SUBNET_ID" INSTANCE_RESPONSE=$(aws ec2 run-instances \ --image-id "$AMI_ID" \ --instance-type "$SELECTED_INSTANCE_TYPE" \ --key-name "$KEY_NAME" \ --security-group-ids "$CLIENT_SG_ID" \ --subnet-id "$SELECTED_SUBNET_ID" \ --iam-instance-profile "Name=$INSTANCE_PROFILE_NAME" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=MSKTutorialClient-${RANDOM_SUFFIX}}]" 2>&1) # Check if the command was successful if [ $? -ne 0 ]; then handle_error "Failed to launch EC2 instance: $INSTANCE_RESPONSE" fi # Extract the instance ID using grep INSTANCE_ID=$(echo "$INSTANCE_RESPONSE" | grep -o '"InstanceId": "[^"]*' | head -1 | cut -d'"' -f4) if [ -z "$INSTANCE_ID" ]; then handle_error "Failed to extract instance ID from response: $INSTANCE_RESPONSE" fi echo "EC2 instance launched successfully. ID: $INSTANCE_ID" echo "Waiting for instance to be running..." # Wait for the instance to be running aws ec2 wait instance-running --instance-ids "$INSTANCE_ID" if [ $? -ne 0 ]; then handle_error "Instance failed to reach running state" fi # Wait a bit more for the instance to initialize echo "Instance is running. Waiting 30 seconds for initialization..." sleep 30 # Get public DNS name of instance CLIENT_DNS=$(aws ec2 describe-instances \ --instance-ids "$INSTANCE_ID" \ --query 'Reservations[0].Instances[0].PublicDnsName' \ --output text) if [ -z "$CLIENT_DNS" ] || [ "$CLIENT_DNS" = "None" ]; then echo "Warning: Could not get public DNS name for instance. Trying public IP..." CLIENT_DNS=$(aws ec2 describe-instances \ --instance-ids "$INSTANCE_ID" \ --query 'Reservations[0].Instances[0].PublicIpAddress' \ --output text) if [ -z "$CLIENT_DNS" ] || [ "$CLIENT_DNS" = "None" ]; then handle_error "Failed to get public DNS name or IP address for instance" fi fi echo "Client instance DNS/IP: $CLIENT_DNS" echo "==============================================" # Get bootstrap brokers with improved logic echo "Getting bootstrap brokers" MAX_RETRIES=10 RETRY_COUNT=0 BOOTSTRAP_BROKERS="" AUTH_METHOD="" while [ -z "$BOOTSTRAP_BROKERS" ] || [ "$BOOTSTRAP_BROKERS" = "None" ]; do # Get the full bootstrap brokers response BOOTSTRAP_RESPONSE=$(aws kafka get-bootstrap-brokers \ --cluster-arn "$CLUSTER_ARN" 2>/dev/null) if [ $? -eq 0 ] && [ -n "$BOOTSTRAP_RESPONSE" ]; then # Try to get IAM authentication brokers first using grep BOOTSTRAP_BROKERS=$(echo "$BOOTSTRAP_RESPONSE" | grep -o '"BootstrapBrokerStringSaslIam": "[^"]*' | cut -d'"' -f4) if [ -n "$BOOTSTRAP_BROKERS" ]; then AUTH_METHOD="IAM" else # Fall back to TLS authentication BOOTSTRAP_BROKERS=$(echo "$BOOTSTRAP_RESPONSE" | grep -o '"BootstrapBrokerStringTls": "[^"]*' | cut -d'"' -f4) if [ -n "$BOOTSTRAP_BROKERS" ]; then AUTH_METHOD="TLS" fi fi fi RETRY_COUNT=$((RETRY_COUNT + 1)) if [ "$RETRY_COUNT" -ge "$MAX_RETRIES" ]; then echo "Warning: Could not get bootstrap brokers after $MAX_RETRIES attempts." echo "You may need to manually retrieve them later using:" echo "aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN" BOOTSTRAP_BROKERS="BOOTSTRAP_BROKERS_NOT_AVAILABLE" AUTH_METHOD="UNKNOWN" break fi if [ -z "$BOOTSTRAP_BROKERS" ] || [ "$BOOTSTRAP_BROKERS" = "None" ]; then echo "Bootstrap brokers not available yet. Retrying in 30 seconds... (Attempt $RETRY_COUNT/$MAX_RETRIES)" sleep 30 fi done echo "Bootstrap brokers: $BOOTSTRAP_BROKERS" echo "Authentication method: $AUTH_METHOD" echo "==============================================" # Create setup script for the client machine echo "Creating setup script for the client machine" cat > setup_client.sh << 'EOF' #!/bin/bash # Set up logging LOG_FILE="client_setup_$(date +%Y%m%d_%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting client setup" echo "==============================================" # Install Java echo "Installing Java" sudo yum -y install java-11 # Set environment variables echo "Setting up environment variables" export KAFKA_VERSION="3.6.0" echo "KAFKA_VERSION=$KAFKA_VERSION" # Download and extract Apache Kafka echo "Downloading Apache Kafka" wget https://archive.apache.org/dist/kafka/$KAFKA_VERSION/kafka_2.13-$KAFKA_VERSION.tgz if [ $? -ne 0 ]; then echo "Failed to download Kafka. Trying alternative mirror..." wget https://www.apache.org/dyn/closer.cgi?path=/kafka/$KAFKA_VERSION/kafka_2.13-$KAFKA_VERSION.tgz fi echo "Extracting Kafka" tar -xzf kafka_2.13-$KAFKA_VERSION.tgz export KAFKA_ROOT=$(pwd)/kafka_2.13-$KAFKA_VERSION echo "KAFKA_ROOT=$KAFKA_ROOT" # Download the MSK IAM authentication package (needed for both IAM and TLS) echo "Downloading MSK IAM authentication package" cd $KAFKA_ROOT/libs wget https://github.com/aws/aws-msk-iam-auth/releases/latest/download/aws-msk-iam-auth-1.1.6-all.jar if [ $? -ne 0 ]; then echo "Failed to download specific version. Trying to get latest version..." LATEST_VERSION=$(curl -s https://api.github.com/repos/aws/aws-msk-iam-auth/releases/latest | grep -o '"tag_name": "[^"]*' | cut -d'"' -f4) wget https://github.com/aws/aws-msk-iam-auth/releases/download/$LATEST_VERSION/aws-msk-iam-auth-$LATEST_VERSION-all.jar if [ $? -ne 0 ]; then echo "Failed to download IAM auth package. Please check the URL and try again." exit 1 fi export CLASSPATH=$KAFKA_ROOT/libs/aws-msk-iam-auth-$LATEST_VERSION-all.jar else export CLASSPATH=$KAFKA_ROOT/libs/aws-msk-iam-auth-1.1.6-all.jar fi echo "CLASSPATH=$CLASSPATH" # Create client properties file based on authentication method echo "Creating client properties file" cd $KAFKA_ROOT/config # The AUTH_METHOD_PLACEHOLDER will be replaced by the script AUTH_METHOD="AUTH_METHOD_PLACEHOLDER" if [ "$AUTH_METHOD" = "IAM" ]; then echo "Configuring for IAM authentication" cat > client.properties << 'EOT' security.protocol=SASL_SSL sasl.mechanism=AWS_MSK_IAM sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required; sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler EOT elif [ "$AUTH_METHOD" = "TLS" ]; then echo "Configuring for TLS authentication" cat > client.properties << 'EOT' security.protocol=SSL EOT else echo "Unknown authentication method. Creating basic TLS configuration." cat > client.properties << 'EOT' security.protocol=SSL EOT fi echo "Client setup completed" echo "==============================================" # Create a script to set up environment variables cat > ~/setup_env.sh << 'EOT' #!/bin/bash export KAFKA_VERSION="3.6.0" export KAFKA_ROOT=~/kafka_2.13-$KAFKA_VERSION export CLASSPATH=$KAFKA_ROOT/libs/aws-msk-iam-auth-1.1.6-all.jar export BOOTSTRAP_SERVER="BOOTSTRAP_SERVER_PLACEHOLDER" export AUTH_METHOD="AUTH_METHOD_PLACEHOLDER" echo "Environment variables set:" echo "KAFKA_VERSION=$KAFKA_VERSION" echo "KAFKA_ROOT=$KAFKA_ROOT" echo "CLASSPATH=$CLASSPATH" echo "BOOTSTRAP_SERVER=$BOOTSTRAP_SERVER" echo "AUTH_METHOD=$AUTH_METHOD" EOT chmod +x ~/setup_env.sh echo "Created environment setup script: ~/setup_env.sh" echo "Run 'source ~/setup_env.sh' to set up your environment" EOF # Replace placeholders in the setup script if [ -n "$BOOTSTRAP_BROKERS" ] && [ "$BOOTSTRAP_BROKERS" != "None" ] && [ "$BOOTSTRAP_BROKERS" != "BOOTSTRAP_BROKERS_NOT_AVAILABLE" ]; then sed -i "s|BOOTSTRAP_SERVER_PLACEHOLDER|$BOOTSTRAP_BROKERS|g" setup_client.sh else # If bootstrap brokers are not available, provide instructions to get them sed -i "s|BOOTSTRAP_SERVER_PLACEHOLDER|\$(aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN --query 'BootstrapBrokerStringTls' --output text)|g" setup_client.sh fi # Replace auth method placeholder sed -i "s|AUTH_METHOD_PLACEHOLDER|$AUTH_METHOD|g" setup_client.sh echo "Setup script created" echo "==============================================" # Display summary of created resources echo "" echo "==============================================" echo "RESOURCE SUMMARY" echo "==============================================" echo "MSK Cluster ARN: $CLUSTER_ARN" echo "MSK Cluster Name: $CLUSTER_NAME" echo "Authentication Method: $AUTH_METHOD" echo "IAM Policy ARN: $POLICY_ARN" echo "IAM Role Name: $ROLE_NAME" echo "IAM Instance Profile: $INSTANCE_PROFILE_NAME" echo "Client Security Group: $CLIENT_SG_ID" echo "EC2 Instance ID: $INSTANCE_ID" echo "EC2 Instance Type: $SELECTED_INSTANCE_TYPE" echo "EC2 Instance DNS: $CLIENT_DNS" echo "Key Pair: $KEY_NAME (saved to $KEY_FILE)" echo "Bootstrap Brokers: $BOOTSTRAP_BROKERS" echo "==============================================" echo "" # Instructions for connecting to the instance and setting up the client echo "NEXT STEPS:" echo "1. Connect to your EC2 instance:" echo " ssh -i $KEY_FILE ec2-user@$CLIENT_DNS" echo "" echo "2. Upload the setup script to your instance:" echo " scp -i $KEY_FILE setup_client.sh ec2-user@$CLIENT_DNS:~/" echo "" echo "3. Run the setup script on your instance:" echo " ssh -i $KEY_FILE ec2-user@$CLIENT_DNS 'chmod +x ~/setup_client.sh && ~/setup_client.sh'" echo "" echo "4. Source the environment setup script:" echo " source ~/setup_env.sh" echo "" # Provide different instructions based on authentication method if [ "$AUTH_METHOD" = "IAM" ]; then echo "5. Create a Kafka topic (using IAM authentication):" echo " \$KAFKA_ROOT/bin/kafka-topics.sh --create \\" echo " --bootstrap-server \$BOOTSTRAP_SERVER \\" echo " --command-config \$KAFKA_ROOT/config/client.properties \\" echo " --replication-factor 3 \\" echo " --partitions 1 \\" echo " --topic MSKTutorialTopic" echo "" echo "6. Start a producer:" echo " \$KAFKA_ROOT/bin/kafka-console-producer.sh \\" echo " --broker-list \$BOOTSTRAP_SERVER \\" echo " --producer.config \$KAFKA_ROOT/config/client.properties \\" echo " --topic MSKTutorialTopic" echo "" echo "7. Start a consumer:" echo " \$KAFKA_ROOT/bin/kafka-console-consumer.sh \\" echo " --bootstrap-server \$BOOTSTRAP_SERVER \\" echo " --consumer.config \$KAFKA_ROOT/config/client.properties \\" echo " --topic MSKTutorialTopic \\" echo " --from-beginning" elif [ "$AUTH_METHOD" = "TLS" ]; then echo "5. Create a Kafka topic (using TLS authentication):" echo " \$KAFKA_ROOT/bin/kafka-topics.sh --create \\" echo " --bootstrap-server \$BOOTSTRAP_SERVER \\" echo " --command-config \$KAFKA_ROOT/config/client.properties \\" echo " --replication-factor 3 \\" echo " --partitions 1 \\" echo " --topic MSKTutorialTopic" echo "" echo "6. Start a producer:" echo " \$KAFKA_ROOT/bin/kafka-console-producer.sh \\" echo " --broker-list \$BOOTSTRAP_SERVER \\" echo " --producer.config \$KAFKA_ROOT/config/client.properties \\" echo " --topic MSKTutorialTopic" echo "" echo "7. Start a consumer:" echo " \$KAFKA_ROOT/bin/kafka-console-consumer.sh \\" echo " --bootstrap-server \$BOOTSTRAP_SERVER \\" echo " --consumer.config \$KAFKA_ROOT/config/client.properties \\" echo " --topic MSKTutorialTopic \\" echo " --from-beginning" else echo "5. Manually retrieve bootstrap brokers and configure authentication:" echo " aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN" fi echo "" echo "8. Verify the topic was created:" echo " \$KAFKA_ROOT/bin/kafka-topics.sh --list \\" echo " --bootstrap-server \$BOOTSTRAP_SERVER \\" echo " --command-config \$KAFKA_ROOT/config/client.properties" echo "==============================================" echo "" # Ask if user wants to clean up resources echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ $CLEANUP_CHOICE =~ ^[Yy]$ ]]; then cleanup_resources echo "All resources have been cleaned up." else echo "Resources will not be cleaned up. You can manually clean them up later." echo "To clean up resources, run this script again and choose 'y' when prompted." fi echo "Script completed successfully!"

The following code example shows how to:

  • Create an OpenSearch Service domain

  • Upload data to your domain

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon OpenSearch Service Getting Started Script - Version 8 Fixed # This script creates an OpenSearch domain, uploads data, searches documents, and cleans up resources # Based on the tested and working 4-tutorial-final.md # FIXES IN V8-FIXED: # 1. Fixed syntax error with regex pattern matching # 2. Fixed access policy to be more permissive and work with fine-grained access control # 3. Added proper resource-based policy that allows both IAM and internal user database access # 4. Improved authentication test with better error handling # 5. Better debugging and troubleshooting information set -e # Exit on any error # Set up logging LOG_FILE="opensearch_tutorial_v8_fixed.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon OpenSearch Service tutorial script v8-fixed at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" # Track if domain was successfully created DOMAIN_CREATED=false DOMAIN_ACTIVE=false # Error handling function handle_error() { echo "ERROR: $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to clean up resources cleanup_resources() { echo "Cleaning up resources..." if [[ "$DOMAIN_CREATED" == "true" ]]; then echo "Checking if domain $DOMAIN_NAME exists before attempting to delete..." # Check if domain exists before trying to delete if aws opensearch describe-domain --domain-name "$DOMAIN_NAME" > /dev/null 2>&1; then echo "Domain $DOMAIN_NAME exists. Proceeding with deletion." aws opensearch delete-domain --domain-name "$DOMAIN_NAME" echo "Domain deletion initiated. This may take several minutes to complete." else echo "Domain $DOMAIN_NAME does not exist or is not accessible. No deletion needed." fi else echo "No domain was successfully created. Nothing to clean up." fi } # Set up trap for cleanup on script exit trap cleanup_resources EXIT # Generate a random identifier for resource names to avoid conflicts RANDOM_ID=$(openssl rand -hex 4) DOMAIN_NAME="movies-${RANDOM_ID}" MASTER_USER="master-user" MASTER_PASSWORD='Master-Password123!' echo "Using domain name: $DOMAIN_NAME" echo "Using master username: $MASTER_USER" echo "Using master password: $MASTER_PASSWORD" # Get AWS account ID (matches tutorial) echo "Retrieving AWS account ID..." ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) if [[ $? -ne 0 ]] || [[ -z "$ACCOUNT_ID" ]]; then handle_error "Failed to retrieve AWS account ID. Please check your AWS credentials." fi echo "AWS Account ID: $ACCOUNT_ID" # Get current region (matches tutorial) echo "Retrieving current AWS region..." AWS_REGION=$(aws configure get region) if [[ -z "$AWS_REGION" ]]; then AWS_REGION="us-east-1" echo "No region found in AWS config, defaulting to $AWS_REGION" else echo "Using AWS region: $AWS_REGION" fi # Step 1: Create an OpenSearch Service Domain echo "Creating OpenSearch Service domain..." echo "This may take 15-30 minutes to complete." # FIXED: Create a more permissive access policy that works with fine-grained access control # This policy allows both IAM users and the internal user database to work ACCESS_POLICY="{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"AWS\":\"*\"},\"Action\":[\"es:ESHttpGet\",\"es:ESHttpPut\",\"es:ESHttpPost\",\"es:ESHttpDelete\",\"es:ESHttpHead\"],\"Resource\":\"arn:aws:es:${AWS_REGION}:${ACCOUNT_ID}:domain/${DOMAIN_NAME}/*\"}]}" echo "Access policy created for region: $AWS_REGION" echo "Access policy: $ACCESS_POLICY" # Create the domain (matches tutorial command exactly) echo "Creating domain $DOMAIN_NAME..." CREATE_OUTPUT=$(aws opensearch create-domain \ --domain-name "$DOMAIN_NAME" \ --engine-version "OpenSearch_2.11" \ --cluster-config "InstanceType=t3.small.search,InstanceCount=1,ZoneAwarenessEnabled=false" \ --ebs-options "EBSEnabled=true,VolumeType=gp3,VolumeSize=10" \ --node-to-node-encryption-options "Enabled=true" \ --encryption-at-rest-options "Enabled=true" \ --domain-endpoint-options "EnforceHTTPS=true" \ --advanced-security-options "Enabled=true,InternalUserDatabaseEnabled=true,MasterUserOptions={MasterUserName=$MASTER_USER,MasterUserPassword=$MASTER_PASSWORD}" \ --access-policies "$ACCESS_POLICY" 2>&1) # Check if domain creation was successful if [[ $? -ne 0 ]]; then echo "Failed to create OpenSearch domain:" echo "$CREATE_OUTPUT" handle_error "Domain creation failed" fi # Verify the domain was actually created by checking the output if echo "$CREATE_OUTPUT" | grep -q "DomainStatus"; then echo "Domain creation initiated successfully." DOMAIN_CREATED=true else echo "Domain creation output:" echo "$CREATE_OUTPUT" handle_error "Domain creation may have failed - no DomainStatus in response" fi # Wait for domain to become active (improved logic) echo "Waiting for domain to become active..." RETRY_COUNT=0 MAX_RETRIES=45 # 45 minutes with 60 second intervals while [[ $RETRY_COUNT -lt $MAX_RETRIES ]]; do echo "Checking domain status... (attempt $((RETRY_COUNT+1))/$MAX_RETRIES)" # Get domain status DOMAIN_STATUS=$(aws opensearch describe-domain --domain-name "$DOMAIN_NAME" 2>&1) if [[ $? -ne 0 ]]; then echo "Error checking domain status:" echo "$DOMAIN_STATUS" # If domain not found after several attempts, it likely failed to create if [[ $RETRY_COUNT -gt 5 ]] && echo "$DOMAIN_STATUS" | grep -q "ResourceNotFoundException"; then handle_error "Domain not found after multiple attempts. Domain creation likely failed." fi echo "Will retry in 60 seconds..." else # Check if domain is no longer processing if echo "$DOMAIN_STATUS" | grep -q '"Processing": false'; then DOMAIN_ACTIVE=true echo "Domain is now active!" break else echo "Domain is still being created. Checking again in 60 seconds..." fi fi sleep 60 RETRY_COUNT=$((RETRY_COUNT+1)) done # Verify domain is active if [[ "$DOMAIN_ACTIVE" != "true" ]]; then echo "Domain creation is taking longer than expected ($((MAX_RETRIES)) minutes)." echo "You can check the status later using:" echo "aws opensearch describe-domain --domain-name $DOMAIN_NAME" handle_error "Domain did not become active within the expected time" fi # Get domain endpoint (matches tutorial) echo "Retrieving domain endpoint..." DOMAIN_ENDPOINT=$(aws opensearch describe-domain --domain-name "$DOMAIN_NAME" --query 'DomainStatus.Endpoint' --output text) if [[ $? -ne 0 ]] || [[ -z "$DOMAIN_ENDPOINT" ]] || [[ "$DOMAIN_ENDPOINT" == "None" ]]; then handle_error "Failed to get domain endpoint" fi echo "Domain endpoint: $DOMAIN_ENDPOINT" # Wait additional time for fine-grained access control to be fully ready echo "Domain is active, but waiting additional time for fine-grained access control to be fully ready..." echo "Fine-grained access control can take several minutes to initialize after domain becomes active." echo "Waiting 8 minutes for full initialization..." sleep 480 # Wait 8 minutes for fine-grained access control to be ready # Verify variables are set correctly (matches tutorial) echo "Verifying configuration..." echo "Domain endpoint: $DOMAIN_ENDPOINT" echo "Master user: $MASTER_USER" echo "Password set: $(if [ -n "$MASTER_PASSWORD" ]; then echo "Yes"; else echo "No"; fi)" # Step 2: Upload Data to the Domain echo "Preparing to upload data to the domain..." # Create a file for the single document (matches tutorial exactly) echo "Creating single document JSON file..." cat > single_movie.json << EOF { "director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!" } EOF # Create a file for bulk documents (matches tutorial exactly) echo "Creating bulk documents JSON file..." cat > bulk_movies.json << EOF { "index" : { "_index": "movies", "_id" : "2" } } {"director": "Frankenheimer, John", "genre": ["Drama", "Mystery", "Thriller", "Crime"], "year": 1962, "actor": ["Lansbury, Angela", "Sinatra, Frank", "Leigh, Janet", "Harvey, Laurence", "Silva, Henry", "Frees, Paul", "Gregory, James", "Bissell, Whit", "McGiver, John", "Parrish, Leslie", "Edwards, James", "Flowers, Bess", "Dhiegh, Khigh", "Payne, Julie", "Kleeb, Helen", "Gray, Joe", "Nalder, Reggie", "Stevens, Bert", "Masters, Michael", "Lowell, Tom"], "title": "The Manchurian Candidate"} { "index" : { "_index": "movies", "_id" : "3" } } {"director": "Baird, Stuart", "genre": ["Action", "Crime", "Thriller"], "year": 1998, "actor": ["Downey Jr., Robert", "Jones, Tommy Lee", "Snipes, Wesley", "Pantoliano, Joe", "Jacob, Irène", "Nelligan, Kate", "Roebuck, Daniel", "Malahide, Patrick", "Richardson, LaTanya", "Wood, Tom", "Kosik, Thomas", "Stellate, Nick", "Minkoff, Robert", "Brown, Spitfire", "Foster, Reese", "Spielbauer, Bruce", "Mukherji, Kevin", "Cray, Ed", "Fordham, David", "Jett, Charlie"], "title": "U.S. Marshals"} { "index" : { "_index": "movies", "_id" : "4" } } {"director": "Ray, Nicholas", "genre": ["Drama", "Romance"], "year": 1955, "actor": ["Hopper, Dennis", "Wood, Natalie", "Dean, James", "Mineo, Sal", "Backus, Jim", "Platt, Edward", "Ray, Nicholas", "Hopper, William", "Allen, Corey", "Birch, Paul", "Hudson, Rochelle", "Doran, Ann", "Hicks, Chuck", "Leigh, Nelson", "Williams, Robert", "Wessel, Dick", "Bryar, Paul", "Sessions, Almira", "McMahon, David", "Peters Jr., House"], "title": "Rebel Without a Cause"} EOF # Check if curl is installed if ! command -v curl &> /dev/null; then echo "Warning: curl is not installed. Skipping data upload and search steps." echo "You can manually upload the data later using the commands in the tutorial." else # IMPROVED: Test authentication with multiple approaches echo "Testing authentication with the OpenSearch domain..." echo "This test checks if fine-grained access control is ready for data operations." # Test 1: Basic authentication with root endpoint echo "Testing basic authentication with root endpoint..." AUTH_TEST_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/" 2>&1) echo "Basic auth test result:" echo "$AUTH_TEST_RESULT" # Extract HTTP status code HTTP_CODE=$(echo "$AUTH_TEST_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) # Function to check if HTTP code is 2xx is_success_code() { local code=$1 if [[ "$code" =~ ^2[0-9][0-9]$ ]]; then return 0 else return 1 fi } # Check if basic authentication test was successful (200 or 2xx status codes) if is_success_code "$HTTP_CODE"; then echo "✓ Basic authentication test successful! (HTTP $HTTP_CODE)" AUTH_SUCCESS=true AUTH_METHOD="basic" else echo "Basic authentication failed with HTTP code: $HTTP_CODE" # Test 2: Try cluster health endpoint echo "Testing with cluster health endpoint..." HEALTH_TEST_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/_cluster/health" 2>&1) echo "Cluster health test result:" echo "$HEALTH_TEST_RESULT" HEALTH_HTTP_CODE=$(echo "$HEALTH_TEST_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) if is_success_code "$HEALTH_HTTP_CODE"; then echo "✓ Cluster health authentication test successful! (HTTP $HEALTH_HTTP_CODE)" AUTH_SUCCESS=true AUTH_METHOD="basic" else echo "Cluster health authentication also failed with HTTP code: $HEALTH_HTTP_CODE" # Check for specific error patterns if echo "$AUTH_TEST_RESULT" | grep -q "anonymous is not authorized"; then echo "Error: Request is being treated as anonymous (authentication not working)" elif echo "$AUTH_TEST_RESULT" | grep -q "Unauthorized"; then echo "Error: Authentication credentials rejected" elif echo "$AUTH_TEST_RESULT" | grep -q "Forbidden"; then echo "Error: Authentication succeeded but access is forbidden" fi echo "Waiting additional time and retrying with exponential backoff..." # Retry authentication test with exponential backoff AUTH_RETRY_COUNT=0 MAX_AUTH_RETRIES=5 WAIT_TIME=60 AUTH_SUCCESS=false while [[ $AUTH_RETRY_COUNT -lt $MAX_AUTH_RETRIES ]]; do echo "Retrying authentication test (attempt $((AUTH_RETRY_COUNT+1))/$MAX_AUTH_RETRIES) after ${WAIT_TIME} seconds..." sleep $WAIT_TIME # Try both endpoints AUTH_TEST_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/" 2>&1) HTTP_CODE=$(echo "$AUTH_TEST_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) echo "Retry result (HTTP $HTTP_CODE):" echo "$AUTH_TEST_RESULT" if is_success_code "$HTTP_CODE"; then echo "✓ Authentication test successful after retry! (HTTP $HTTP_CODE)" AUTH_SUCCESS=true AUTH_METHOD="basic" break fi # Also try cluster health HEALTH_TEST_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/_cluster/health" 2>&1) HEALTH_HTTP_CODE=$(echo "$HEALTH_TEST_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) if is_success_code "$HEALTH_HTTP_CODE"; then echo "✓ Cluster health authentication successful after retry! (HTTP $HEALTH_HTTP_CODE)" AUTH_SUCCESS=true AUTH_METHOD="basic" break fi AUTH_RETRY_COUNT=$((AUTH_RETRY_COUNT+1)) # Exponential backoff: double the wait time each retry (max 10 minutes) WAIT_TIME=$((WAIT_TIME * 2)) if [[ $WAIT_TIME -gt 600 ]]; then WAIT_TIME=600 fi done fi fi # Proceed with data operations if authentication is working if [[ "$AUTH_SUCCESS" == "true" ]]; then echo "Authentication successful using $AUTH_METHOD method. Proceeding with data operations." # Upload single document (matches tutorial exactly) echo "Uploading single document..." UPLOAD_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request PUT \ --header 'Content-Type: application/json' \ --data @single_movie.json \ "https://${DOMAIN_ENDPOINT}/movies/_doc/1" 2>&1) echo "Upload response:" echo "$UPLOAD_RESULT" UPLOAD_HTTP_CODE=$(echo "$UPLOAD_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) if is_success_code "$UPLOAD_HTTP_CODE" && echo "$UPLOAD_RESULT" | grep -q '"result"'; then echo "✓ Single document uploaded successfully! (HTTP $UPLOAD_HTTP_CODE)" else echo "⚠ Warning: Single document upload may have failed (HTTP $UPLOAD_HTTP_CODE)" fi # Upload bulk documents (matches tutorial exactly) echo "Uploading bulk documents..." BULK_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request POST \ --header 'Content-Type: application/x-ndjson' \ --data-binary @bulk_movies.json \ "https://${DOMAIN_ENDPOINT}/movies/_bulk" 2>&1) echo "Bulk upload response:" echo "$BULK_RESULT" BULK_HTTP_CODE=$(echo "$BULK_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) if is_success_code "$BULK_HTTP_CODE" && echo "$BULK_RESULT" | grep -q '"errors": false'; then echo "✓ Bulk documents uploaded successfully! (HTTP $BULK_HTTP_CODE)" else echo "⚠ Warning: Bulk document upload may have failed (HTTP $BULK_HTTP_CODE)" fi # Wait a moment for indexing echo "Waiting for documents to be indexed..." sleep 5 # Step 3: Search Documents (matches tutorial exactly) echo "Searching for documents containing 'mars'..." SEARCH_RESULT=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/movies/_search?q=mars&pretty=true" 2>&1) SEARCH_HTTP_CODE=$(echo "$SEARCH_RESULT" | grep "HTTP_CODE:" | cut -d: -f2) echo "Search results for 'mars' (HTTP $SEARCH_HTTP_CODE):" echo "$SEARCH_RESULT" echo "Searching for documents containing 'rebel'..." REBEL_SEARCH=$(curl -s -w "\nHTTP_CODE:%{http_code}" \ --user "${MASTER_USER}:${MASTER_PASSWORD}" \ --request GET \ "https://${DOMAIN_ENDPOINT}/movies/_search?q=rebel&pretty=true" 2>&1) REBEL_HTTP_CODE=$(echo "$REBEL_SEARCH" | grep "HTTP_CODE:" | cut -d: -f2) echo "Search results for 'rebel' (HTTP $REBEL_HTTP_CODE):" echo "$REBEL_SEARCH" # Verify search results if is_success_code "$SEARCH_HTTP_CODE" && echo "$SEARCH_RESULT" | grep -q '"hits"'; then echo "✓ Search functionality is working!" else echo "⚠ Warning: Search may not be working properly." fi else echo "" echo "==========================================" echo "AUTHENTICATION TROUBLESHOOTING" echo "==========================================" echo "Authentication failed after all retries. This may be due to:" echo "1. Fine-grained access control not fully initialized (most common)" echo "2. Domain configuration issues" echo "3. Network connectivity issues" echo "4. AWS credentials or permissions issues" echo "" echo "DOMAIN CONFIGURATION DEBUG:" echo "Let's check the domain configuration..." # Debug domain configuration DOMAIN_CONFIG=$(aws opensearch describe-domain --domain-name "$DOMAIN_NAME" --query 'DomainStatus.{AdvancedSecurityOptions: AdvancedSecurityOptions, AccessPolicies: AccessPolicies}' --output json 2>&1) echo "Domain configuration:" echo "$DOMAIN_CONFIG" echo "" echo "MANUAL TESTING COMMANDS:" echo "You can try these commands manually in 10-15 minutes:" echo "" echo "# Test basic authentication:" echo "curl --user \"${MASTER_USER}:${MASTER_PASSWORD}\" \"https://${DOMAIN_ENDPOINT}/\"" echo "" echo "# Test cluster health:" echo "curl --user \"${MASTER_USER}:${MASTER_PASSWORD}\" \"https://${DOMAIN_ENDPOINT}/_cluster/health\"" echo "" echo "# Upload single document:" echo "curl --user \"${MASTER_USER}:${MASTER_PASSWORD}\" --request PUT --header 'Content-Type: application/json' --data @single_movie.json \"https://${DOMAIN_ENDPOINT}/movies/_doc/1\"" echo "" echo "# Search for documents:" echo "curl --user \"${MASTER_USER}:${MASTER_PASSWORD}\" \"https://${DOMAIN_ENDPOINT}/movies/_search?q=mars&pretty=true\"" echo "" echo "TROUBLESHOOTING TIPS:" echo "- Wait 10-15 more minutes and try the manual commands" echo "- Check AWS CloudTrail logs for authentication errors" echo "- Verify your AWS region is correct: $AWS_REGION" echo "- Ensure your AWS credentials have OpenSearch permissions" echo "- Try accessing OpenSearch Dashboards to verify the master user works" echo "" echo "Skipping data upload and search operations for now." echo "The domain is created and accessible via OpenSearch Dashboards." fi fi # Display OpenSearch Dashboards URL (matches tutorial) echo "" echo "===========================================" echo "OPENSEARCH DASHBOARDS ACCESS" echo "===========================================" echo "OpenSearch Dashboards URL: https://${DOMAIN_ENDPOINT}/_dashboards/" echo "Username: $MASTER_USER" echo "Password: $MASTER_PASSWORD" echo "" echo "You can access OpenSearch Dashboards using these credentials." echo "If you uploaded data successfully, you can create an index pattern for 'movies'." echo "" # Summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "OpenSearch Domain Name: $DOMAIN_NAME" echo "OpenSearch Domain Endpoint: $DOMAIN_ENDPOINT" echo "AWS Region: $AWS_REGION" echo "Master Username: $MASTER_USER" echo "Master Password: $MASTER_PASSWORD" echo "" echo "ESTIMATED COST: ~$0.038/hour (~$0.91/day) until deleted" echo "" echo "Make sure to save these details for future reference." echo "" # Ask user if they want to clean up resources echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources now? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then echo "Cleaning up resources..." aws opensearch delete-domain --domain-name "$DOMAIN_NAME" echo "✓ Cleanup initiated. Domain deletion may take several minutes to complete." echo "" echo "You can check the deletion status using:" echo "aws opensearch describe-domain --domain-name $DOMAIN_NAME" echo "" echo "When deletion is complete, you'll see a 'Domain not found' error." else echo "Resources will NOT be deleted automatically." echo "" echo "To delete the domain later, use:" echo "aws opensearch delete-domain --domain-name $DOMAIN_NAME" echo "" echo "⚠ IMPORTANT: Keeping these resources will incur ongoing AWS charges!" echo " Estimated cost: ~$0.038/hour (~$0.91/day)" fi # Clean up temporary files echo "Cleaning up temporary files..." rm -f single_movie.json bulk_movies.json # Disable the trap since we're handling cleanup manually trap - EXIT echo "" echo "===========================================" echo "SCRIPT COMPLETED SUCCESSFULLY" echo "===========================================" echo "Script completed at $(date)" echo "All output has been logged to: $LOG_FILE" echo "" echo "Next steps:" echo "1. Access OpenSearch Dashboards at: https://${DOMAIN_ENDPOINT}/_dashboards/" echo "2. Create visualizations and dashboards" echo "3. Explore the OpenSearch API" echo "4. Remember to delete resources when done to avoid charges"

The following code example shows how to:

  • Set up IAM permissions

  • Create a SageMaker execution role

  • Create feature groups

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Amazon SageMaker Feature Store Tutorial Script - Version 3 # This script demonstrates how to use Amazon SageMaker Feature Store with AWS CLI # Setup logging LOG_FILE="sagemaker-featurestore-tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting SageMaker Feature Store tutorial script at $(date)" echo "All commands and outputs will be logged to $LOG_FILE" echo "" # Track created resources for cleanup CREATED_RESOURCES=() # Function to handle errors handle_error() { echo "ERROR: $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Function to check command status check_status() { if echo "$1" | grep -i "error" > /dev/null; then handle_error "$1" fi } # Function to wait for feature group to be created wait_for_feature_group() { local feature_group_name=$1 local status="Creating" echo "Waiting for feature group ${feature_group_name} to be created..." while [ "$status" = "Creating" ]; do sleep 5 status=$(aws sagemaker describe-feature-group \ --feature-group-name "${feature_group_name}" \ --query 'FeatureGroupStatus' \ --output text) echo "Current status: ${status}" if [ "$status" = "Failed" ]; then handle_error "Feature group ${feature_group_name} creation failed" fi done echo "Feature group ${feature_group_name} is now ${status}" } # Function to clean up resources cleanup_resources() { echo "Cleaning up resources..." # Clean up in reverse order for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do resource="${CREATED_RESOURCES[$i]}" resource_type=$(echo "$resource" | cut -d: -f1) resource_name=$(echo "$resource" | cut -d: -f2) echo "Deleting $resource_type: $resource_name" case "$resource_type" in "FeatureGroup") aws sagemaker delete-feature-group --feature-group-name "$resource_name" ;; "S3Bucket") echo "Emptying S3 bucket: $resource_name" aws s3 rm "s3://$resource_name" --recursive 2>/dev/null echo "Deleting S3 bucket: $resource_name" aws s3api delete-bucket --bucket "$resource_name" 2>/dev/null ;; "IAMRole") echo "Detaching policies from role: $resource_name" aws iam detach-role-policy --role-name "$resource_name" --policy-arn "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 2>/dev/null aws iam detach-role-policy --role-name "$resource_name" --policy-arn "arn:aws:iam::aws:policy/AmazonS3FullAccess" 2>/dev/null echo "Deleting IAM role: $resource_name" aws iam delete-role --role-name "$resource_name" 2>/dev/null ;; *) echo "Unknown resource type: $resource_type" ;; esac done } # Function to create SageMaker execution role create_sagemaker_role() { local role_name="SageMakerFeatureStoreRole-$(openssl rand -hex 4)" echo "Creating SageMaker execution role: $role_name" >&2 # Create trust policy document local trust_policy='{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' # Create the role local role_result=$(aws iam create-role \ --role-name "$role_name" \ --assume-role-policy-document "$trust_policy" \ --description "SageMaker execution role for Feature Store tutorial" 2>&1) if echo "$role_result" | grep -i "error" > /dev/null; then handle_error "Failed to create IAM role: $role_result" fi echo "Role created successfully" >&2 CREATED_RESOURCES+=("IAMRole:$role_name") # Attach necessary policies echo "Attaching policies to role..." >&2 # SageMaker execution policy local policy1_result=$(aws iam attach-role-policy \ --role-name "$role_name" \ --policy-arn "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 2>&1) if echo "$policy1_result" | grep -i "error" > /dev/null; then handle_error "Failed to attach SageMaker policy: $policy1_result" fi # S3 access policy local policy2_result=$(aws iam attach-role-policy \ --role-name "$role_name" \ --policy-arn "arn:aws:iam::aws:policy/AmazonS3FullAccess" 2>&1) if echo "$policy2_result" | grep -i "error" > /dev/null; then handle_error "Failed to attach S3 policy: $policy2_result" fi # Get account ID for role ARN local account_id=$(aws sts get-caller-identity --query Account --output text) local role_arn="arn:aws:iam::${account_id}:role/${role_name}" echo "Role ARN: $role_arn" >&2 echo "Waiting 10 seconds for role to propagate..." >&2 sleep 10 # Return only the role ARN to stdout echo "$role_arn" } # Handle SageMaker execution role ROLE_ARN="" if [ -z "$1" ]; then echo "Creating SageMaker execution role automatically..." ROLE_ARN=$(create_sagemaker_role) if [ -z "$ROLE_ARN" ]; then handle_error "Failed to create SageMaker execution role" fi else ROLE_ARN="$1" # Validate the role ARN ROLE_NAME=$(echo "$ROLE_ARN" | sed 's/.*role\///') ROLE_CHECK=$(aws iam get-role --role-name "$ROLE_NAME" 2>&1) if echo "$ROLE_CHECK" | grep -i "error" > /dev/null; then echo "Creating a new role automatically..." ROLE_ARN=$(create_sagemaker_role) if [ -z "$ROLE_ARN" ]; then handle_error "Failed to create SageMaker execution role" fi fi fi # Handle cleanup option AUTO_CLEANUP="" if [ -n "$2" ]; then AUTO_CLEANUP="$2" fi # Generate a random identifier for resource names RANDOM_ID=$(openssl rand -hex 4) echo "Using random identifier: $RANDOM_ID" # Set variables ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) check_status "$ACCOUNT_ID" echo "Account ID: $ACCOUNT_ID" # Get current region REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" echo "No default region configured, using: $REGION" else echo "Using region: $REGION" fi S3_BUCKET_NAME="sagemaker-featurestore-${RANDOM_ID}-${ACCOUNT_ID}" PREFIX="featurestore-tutorial" CURRENT_TIME=$(date +%s) echo "Creating S3 bucket: $S3_BUCKET_NAME" # Create bucket in current region if [ "$REGION" = "us-east-1" ]; then BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" \ --region "$REGION" 2>&1) else BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" \ --region "$REGION" \ --create-bucket-configuration LocationConstraint="$REGION" 2>&1) fi if echo "$BUCKET_RESULT" | grep -i "error" > /dev/null; then echo "Failed to create S3 bucket: $BUCKET_RESULT" exit 1 fi echo "$BUCKET_RESULT" CREATED_RESOURCES+=("S3Bucket:$S3_BUCKET_NAME") # Block public access to the bucket BLOCK_RESULT=$(aws s3api put-public-access-block \ --bucket "$S3_BUCKET_NAME" \ --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" 2>&1) if echo "$BLOCK_RESULT" | grep -i "error" > /dev/null; then echo "Failed to block public access to S3 bucket: $BLOCK_RESULT" cleanup_resources exit 1 fi # Create feature groups echo "Creating feature groups..." # Create customers feature group CUSTOMERS_FEATURE_GROUP_NAME="customers-feature-group-${RANDOM_ID}" echo "Creating customers feature group: $CUSTOMERS_FEATURE_GROUP_NAME" CUSTOMERS_RESPONSE=$(aws sagemaker create-feature-group \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record-identifier-feature-name "customer_id" \ --event-time-feature-name "EventTime" \ --feature-definitions '[ {"FeatureName": "customer_id", "FeatureType": "Integral"}, {"FeatureName": "name", "FeatureType": "String"}, {"FeatureName": "age", "FeatureType": "Integral"}, {"FeatureName": "address", "FeatureType": "String"}, {"FeatureName": "membership_type", "FeatureType": "String"}, {"FeatureName": "EventTime", "FeatureType": "Fractional"} ]' \ --online-store-config '{"EnableOnlineStore": true}' \ --offline-store-config '{ "S3StorageConfig": { "S3Uri": "s3://'${S3_BUCKET_NAME}'/'${PREFIX}'" }, "DisableGlueTableCreation": false }' \ --role-arn "$ROLE_ARN" 2>&1) if echo "$CUSTOMERS_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to create customers feature group: $CUSTOMERS_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMERS_RESPONSE" CREATED_RESOURCES+=("FeatureGroup:$CUSTOMERS_FEATURE_GROUP_NAME") # Create orders feature group ORDERS_FEATURE_GROUP_NAME="orders-feature-group-${RANDOM_ID}" echo "Creating orders feature group: $ORDERS_FEATURE_GROUP_NAME" ORDERS_RESPONSE=$(aws sagemaker create-feature-group \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record-identifier-feature-name "customer_id" \ --event-time-feature-name "EventTime" \ --feature-definitions '[ {"FeatureName": "customer_id", "FeatureType": "Integral"}, {"FeatureName": "order_id", "FeatureType": "String"}, {"FeatureName": "order_date", "FeatureType": "String"}, {"FeatureName": "product", "FeatureType": "String"}, {"FeatureName": "quantity", "FeatureType": "Integral"}, {"FeatureName": "amount", "FeatureType": "Fractional"}, {"FeatureName": "EventTime", "FeatureType": "Fractional"} ]' \ --online-store-config '{"EnableOnlineStore": true}' \ --offline-store-config '{ "S3StorageConfig": { "S3Uri": "s3://'${S3_BUCKET_NAME}'/'${PREFIX}'" }, "DisableGlueTableCreation": false }' \ --role-arn "$ROLE_ARN" 2>&1) if echo "$ORDERS_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to create orders feature group: $ORDERS_RESPONSE" cleanup_resources exit 1 fi echo "$ORDERS_RESPONSE" CREATED_RESOURCES+=("FeatureGroup:$ORDERS_FEATURE_GROUP_NAME") # Wait for feature groups to be created wait_for_feature_group "$CUSTOMERS_FEATURE_GROUP_NAME" wait_for_feature_group "$ORDERS_FEATURE_GROUP_NAME" # Ingest data into feature groups echo "Ingesting data into feature groups..." # Ingest customer data echo "Ingesting customer data..." CUSTOMER1_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "573291"}, {"FeatureName": "name", "ValueAsString": "John Doe"}, {"FeatureName": "age", "ValueAsString": "35"}, {"FeatureName": "address", "ValueAsString": "123 Main St"}, {"FeatureName": "membership_type", "ValueAsString": "premium"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$CUSTOMER1_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest customer 1 data: $CUSTOMER1_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMER1_RESPONSE" CUSTOMER2_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "109382"}, {"FeatureName": "name", "ValueAsString": "Jane Smith"}, {"FeatureName": "age", "ValueAsString": "28"}, {"FeatureName": "address", "ValueAsString": "456 Oak Ave"}, {"FeatureName": "membership_type", "ValueAsString": "standard"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$CUSTOMER2_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest customer 2 data: $CUSTOMER2_RESPONSE" cleanup_resources exit 1 fi echo "$CUSTOMER2_RESPONSE" # Ingest order data echo "Ingesting order data..." ORDER1_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "573291"}, {"FeatureName": "order_id", "ValueAsString": "ORD-001"}, {"FeatureName": "order_date", "ValueAsString": "2023-01-15"}, {"FeatureName": "product", "ValueAsString": "Laptop"}, {"FeatureName": "quantity", "ValueAsString": "1"}, {"FeatureName": "amount", "ValueAsString": "1299.99"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$ORDER1_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest order 1 data: $ORDER1_RESPONSE" cleanup_resources exit 1 fi echo "$ORDER1_RESPONSE" ORDER2_RESPONSE=$(aws sagemaker-featurestore-runtime put-record \ --feature-group-name "$ORDERS_FEATURE_GROUP_NAME" \ --record '[ {"FeatureName": "customer_id", "ValueAsString": "109382"}, {"FeatureName": "order_id", "ValueAsString": "ORD-002"}, {"FeatureName": "order_date", "ValueAsString": "2023-01-20"}, {"FeatureName": "product", "ValueAsString": "Smartphone"}, {"FeatureName": "quantity", "ValueAsString": "1"}, {"FeatureName": "amount", "ValueAsString": "899.99"}, {"FeatureName": "EventTime", "ValueAsString": "'${CURRENT_TIME}'"} ]' 2>&1) if echo "$ORDER2_RESPONSE" | grep -i "error" > /dev/null; then echo "Failed to ingest order 2 data: $ORDER2_RESPONSE" cleanup_resources exit 1 fi echo "$ORDER2_RESPONSE" # Retrieve records from feature groups echo "Retrieving records from feature groups..." # Get a single customer record echo "Getting customer record with ID 573291:" CUSTOMER_RECORD=$(aws sagemaker-featurestore-runtime get-record \ --feature-group-name "$CUSTOMERS_FEATURE_GROUP_NAME" \ --record-identifier-value-as-string "573291" 2>&1) if echo "$CUSTOMER_RECORD" | grep -i "error" > /dev/null; then echo "Failed to get customer record: $CUSTOMER_RECORD" cleanup_resources exit 1 fi echo "$CUSTOMER_RECORD" # Get multiple records using batch-get-record echo "Getting multiple records using batch-get-record:" BATCH_RECORDS=$(aws sagemaker-featurestore-runtime batch-get-record \ --identifiers '[ { "FeatureGroupName": "'${CUSTOMERS_FEATURE_GROUP_NAME}'", "RecordIdentifiersValueAsString": ["573291", "109382"] }, { "FeatureGroupName": "'${ORDERS_FEATURE_GROUP_NAME}'", "RecordIdentifiersValueAsString": ["573291", "109382"] } ]' 2>&1) if echo "$BATCH_RECORDS" | grep -i "error" > /dev/null && ! echo "$BATCH_RECORDS" | grep -i "Records" > /dev/null; then echo "Failed to get batch records: $BATCH_RECORDS" cleanup_resources exit 1 fi echo "$BATCH_RECORDS" # List feature groups echo "Listing feature groups:" FEATURE_GROUPS=$(aws sagemaker list-feature-groups 2>&1) if echo "$FEATURE_GROUPS" | grep -i "error" > /dev/null; then echo "Failed to list feature groups: $FEATURE_GROUPS" cleanup_resources exit 1 fi echo "$FEATURE_GROUPS" # Display summary of created resources echo "" echo "===========================================" echo "TUTORIAL COMPLETED SUCCESSFULLY!" echo "===========================================" echo "Resources created:" echo "- S3 Bucket: $S3_BUCKET_NAME" echo "- Customers Feature Group: $CUSTOMERS_FEATURE_GROUP_NAME" echo "- Orders Feature Group: $ORDERS_FEATURE_GROUP_NAME" if [[ " ${CREATED_RESOURCES[@]} " =~ " IAMRole:" ]]; then echo "- IAM Role: $(echo "${CREATED_RESOURCES[@]}" | grep -o 'IAMRole:[^[:space:]]*' | cut -d: -f2)" fi echo "" echo "You can now:" echo "1. View your feature groups in the SageMaker console" echo "2. Query the offline store using Amazon Athena" echo "3. Use the feature groups in your ML workflows" echo "===========================================" echo "" # Handle cleanup if [ "$AUTO_CLEANUP" = "y" ]; then echo "Auto-cleanup enabled. Starting cleanup..." cleanup_resources echo "Cleanup completed." elif [ "$AUTO_CLEANUP" = "n" ]; then echo "Auto-cleanup disabled. Resources will remain in your account." echo "To clean up later, run this script again with cleanup option 'y'" else echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup..." cleanup_resources echo "Cleanup completed." else echo "Skipping cleanup. Resources will remain in your account." echo "To clean up later, delete the following resources:" echo "- Feature Groups: $CUSTOMERS_FEATURE_GROUP_NAME, $ORDERS_FEATURE_GROUP_NAME" echo "- S3 Bucket: $S3_BUCKET_NAME" if [[ " ${CREATED_RESOURCES[@]} " =~ " IAMRole:" ]]; then echo "- IAM Role: $(echo "${CREATED_RESOURCES[@]}" | grep -o 'IAMRole:[^[:space:]]*' | cut -d: -f2)" fi echo "" echo "Estimated ongoing cost: ~$0.01 per month for online store" fi fi echo "Script completed at $(date)"

The following code example shows how to:

  • Create an Amazon S3 bucket

  • Create an Amazon SNS topic

  • Create an IAM role for Config

  • Set up the Config configuration recorder

  • Set up the Config delivery channel

  • Start the configuration recorder

  • Verify the Config setup

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS Config Setup Script (v2) # This script sets up AWS Config with the AWS CLI # Error handling set -e LOGFILE="aws-config-setup-v2.log" touch $LOGFILE exec > >(tee -a $LOGFILE) exec 2>&1 # Function to handle errors handle_error() { echo "ERROR: An error occurred at line $1" echo "Attempting to clean up resources..." cleanup_resources exit 1 } # Set trap for error handling trap 'handle_error $LINENO' ERR # Function to generate random identifier generate_random_id() { echo $(openssl rand -hex 6) } # Function to check if command was successful check_command() { if echo "$1" | grep -i "error" > /dev/null; then echo "ERROR: $1" return 1 fi return 0 } # Function to clean up resources cleanup_resources() { if [ -n "$CONFIG_RECORDER_NAME" ]; then echo "Stopping configuration recorder..." aws configservice stop-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME" 2>/dev/null || true fi # Check if we created a new delivery channel before trying to delete it if [ -n "$DELIVERY_CHANNEL_NAME" ] && [ "$CREATED_NEW_DELIVERY_CHANNEL" = "true" ]; then echo "Deleting delivery channel..." aws configservice delete-delivery-channel --delivery-channel-name "$DELIVERY_CHANNEL_NAME" 2>/dev/null || true fi if [ -n "$CONFIG_RECORDER_NAME" ] && [ "$CREATED_NEW_CONFIG_RECORDER" = "true" ]; then echo "Deleting configuration recorder..." aws configservice delete-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME" 2>/dev/null || true fi if [ -n "$ROLE_NAME" ]; then if [ -n "$POLICY_NAME" ]; then echo "Detaching custom policy from role..." aws iam delete-role-policy --role-name "$ROLE_NAME" --policy-name "$POLICY_NAME" 2>/dev/null || true fi if [ -n "$MANAGED_POLICY_ARN" ]; then echo "Detaching managed policy from role..." aws iam detach-role-policy --role-name "$ROLE_NAME" --policy-arn "$MANAGED_POLICY_ARN" 2>/dev/null || true fi echo "Deleting IAM role..." aws iam delete-role --role-name "$ROLE_NAME" 2>/dev/null || true fi if [ -n "$SNS_TOPIC_ARN" ]; then echo "Deleting SNS topic..." aws sns delete-topic --topic-arn "$SNS_TOPIC_ARN" 2>/dev/null || true fi if [ -n "$S3_BUCKET_NAME" ]; then echo "Emptying S3 bucket..." aws s3 rm "s3://$S3_BUCKET_NAME" --recursive 2>/dev/null || true echo "Deleting S3 bucket..." aws s3api delete-bucket --bucket "$S3_BUCKET_NAME" 2>/dev/null || true fi } # Function to display created resources display_resources() { echo "" echo "===========================================" echo "CREATED RESOURCES" echo "===========================================" echo "S3 Bucket: $S3_BUCKET_NAME" echo "SNS Topic ARN: $SNS_TOPIC_ARN" echo "IAM Role: $ROLE_NAME" if [ "$CREATED_NEW_CONFIG_RECORDER" = "true" ]; then echo "Configuration Recorder: $CONFIG_RECORDER_NAME (newly created)" else echo "Configuration Recorder: $CONFIG_RECORDER_NAME (existing)" fi if [ "$CREATED_NEW_DELIVERY_CHANNEL" = "true" ]; then echo "Delivery Channel: $DELIVERY_CHANNEL_NAME (newly created)" else echo "Delivery Channel: $DELIVERY_CHANNEL_NAME (existing)" fi echo "===========================================" } # Get AWS account ID echo "Getting AWS account ID..." ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) if [ -z "$ACCOUNT_ID" ]; then echo "ERROR: Failed to get AWS account ID" exit 1 fi echo "AWS Account ID: $ACCOUNT_ID" # Generate random identifier for resources RANDOM_ID=$(generate_random_id) echo "Generated random identifier: $RANDOM_ID" # Step 1: Create an S3 bucket S3_BUCKET_NAME="configservice-${RANDOM_ID}" echo "Creating S3 bucket: $S3_BUCKET_NAME" # Get the current region AWS_REGION=$(aws configure get region) if [ -z "$AWS_REGION" ]; then AWS_REGION="us-east-1" # Default to us-east-1 if no region is configured fi echo "Using AWS Region: $AWS_REGION" # Create bucket with appropriate command based on region if [ "$AWS_REGION" = "us-east-1" ]; then BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME") else BUCKET_RESULT=$(aws s3api create-bucket --bucket "$S3_BUCKET_NAME" --create-bucket-configuration LocationConstraint="$AWS_REGION") fi check_command "$BUCKET_RESULT" echo "S3 bucket created: $S3_BUCKET_NAME" # Block public access for the bucket aws s3api put-public-access-block \ --bucket "$S3_BUCKET_NAME" \ --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" echo "Public access blocked for bucket" # Step 2: Create an SNS topic TOPIC_NAME="config-topic-${RANDOM_ID}" echo "Creating SNS topic: $TOPIC_NAME" SNS_RESULT=$(aws sns create-topic --name "$TOPIC_NAME") check_command "$SNS_RESULT" SNS_TOPIC_ARN=$(echo "$SNS_RESULT" | grep -o 'arn:aws:sns:[^"]*') echo "SNS topic created: $SNS_TOPIC_ARN" # Step 3: Create an IAM role for AWS Config ROLE_NAME="config-role-${RANDOM_ID}" POLICY_NAME="config-delivery-permissions" MANAGED_POLICY_ARN="arn:aws:iam::aws:policy/service-role/AWS_ConfigRole" echo "Creating trust policy document..." cat > config-trust-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "config.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF echo "Creating IAM role: $ROLE_NAME" ROLE_RESULT=$(aws iam create-role --role-name "$ROLE_NAME" --assume-role-policy-document file://config-trust-policy.json) check_command "$ROLE_RESULT" ROLE_ARN=$(echo "$ROLE_RESULT" | grep -o 'arn:aws:iam::[^"]*' | head -1) echo "IAM role created: $ROLE_ARN" echo "Attaching AWS managed policy to role..." ATTACH_RESULT=$(aws iam attach-role-policy --role-name "$ROLE_NAME" --policy-arn "$MANAGED_POLICY_ARN") check_command "$ATTACH_RESULT" echo "AWS managed policy attached" echo "Creating custom policy document for S3 and SNS access..." cat > config-delivery-permissions.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject" ], "Resource": "arn:aws:s3:::${S3_BUCKET_NAME}/AWSLogs/${ACCOUNT_ID}/*", "Condition": { "StringLike": { "s3:x-amz-acl": "bucket-owner-full-control" } } }, { "Effect": "Allow", "Action": [ "s3:GetBucketAcl" ], "Resource": "arn:aws:s3:::${S3_BUCKET_NAME}" }, { "Effect": "Allow", "Action": [ "sns:Publish" ], "Resource": "${SNS_TOPIC_ARN}" } ] } EOF echo "Attaching custom policy to role..." POLICY_RESULT=$(aws iam put-role-policy --role-name "$ROLE_NAME" --policy-name "$POLICY_NAME" --policy-document file://config-delivery-permissions.json) check_command "$POLICY_RESULT" echo "Custom policy attached" # Wait for IAM role to propagate echo "Waiting for IAM role to propagate (15 seconds)..." sleep 15 # Step 4: Check if configuration recorder already exists CONFIG_RECORDER_NAME="default" CREATED_NEW_CONFIG_RECORDER="false" echo "Checking for existing configuration recorder..." EXISTING_RECORDERS=$(aws configservice describe-configuration-recorders 2>/dev/null || echo "") if echo "$EXISTING_RECORDERS" | grep -q "name"; then echo "Configuration recorder already exists. Will update it." # Get the name of the existing recorder CONFIG_RECORDER_NAME=$(echo "$EXISTING_RECORDERS" | grep -o '"name": "[^"]*"' | head -1 | cut -d'"' -f4) echo "Using existing configuration recorder: $CONFIG_RECORDER_NAME" else echo "No existing configuration recorder found. Will create a new one." CREATED_NEW_CONFIG_RECORDER="true" fi echo "Creating configuration recorder configuration..." cat > configurationRecorder.json << EOF { "name": "${CONFIG_RECORDER_NAME}", "roleARN": "${ROLE_ARN}", "recordingMode": { "recordingFrequency": "CONTINUOUS" } } EOF echo "Creating recording group configuration..." cat > recordingGroup.json << EOF { "allSupported": true, "includeGlobalResourceTypes": true } EOF echo "Setting up configuration recorder..." RECORDER_RESULT=$(aws configservice put-configuration-recorder --configuration-recorder file://configurationRecorder.json --recording-group file://recordingGroup.json) check_command "$RECORDER_RESULT" echo "Configuration recorder set up" # Step 5: Check if delivery channel already exists DELIVERY_CHANNEL_NAME="default" CREATED_NEW_DELIVERY_CHANNEL="false" echo "Checking for existing delivery channel..." EXISTING_CHANNELS=$(aws configservice describe-delivery-channels 2>/dev/null || echo "") if echo "$EXISTING_CHANNELS" | grep -q "name"; then echo "Delivery channel already exists." # Get the name of the existing channel DELIVERY_CHANNEL_NAME=$(echo "$EXISTING_CHANNELS" | grep -o '"name": "[^"]*"' | head -1 | cut -d'"' -f4) echo "Using existing delivery channel: $DELIVERY_CHANNEL_NAME" # Update the existing delivery channel echo "Creating delivery channel configuration for update..." cat > deliveryChannel.json << EOF { "name": "${DELIVERY_CHANNEL_NAME}", "s3BucketName": "${S3_BUCKET_NAME}", "snsTopicARN": "${SNS_TOPIC_ARN}", "configSnapshotDeliveryProperties": { "deliveryFrequency": "Six_Hours" } } EOF echo "Updating delivery channel..." CHANNEL_RESULT=$(aws configservice put-delivery-channel --delivery-channel file://deliveryChannel.json) check_command "$CHANNEL_RESULT" echo "Delivery channel updated" else echo "No existing delivery channel found. Will create a new one." CREATED_NEW_DELIVERY_CHANNEL="true" echo "Creating delivery channel configuration..." cat > deliveryChannel.json << EOF { "name": "${DELIVERY_CHANNEL_NAME}", "s3BucketName": "${S3_BUCKET_NAME}", "snsTopicARN": "${SNS_TOPIC_ARN}", "configSnapshotDeliveryProperties": { "deliveryFrequency": "Six_Hours" } } EOF echo "Creating delivery channel..." CHANNEL_RESULT=$(aws configservice put-delivery-channel --delivery-channel file://deliveryChannel.json) check_command "$CHANNEL_RESULT" echo "Delivery channel created" fi # Step 6: Start the configuration recorder echo "Checking configuration recorder status..." RECORDER_STATUS=$(aws configservice describe-configuration-recorder-status 2>/dev/null || echo "") if echo "$RECORDER_STATUS" | grep -q '"recording": true'; then echo "Configuration recorder is already running." else echo "Starting configuration recorder..." START_RESULT=$(aws configservice start-configuration-recorder --configuration-recorder-name "$CONFIG_RECORDER_NAME") check_command "$START_RESULT" echo "Configuration recorder started" fi # Step 7: Verify the AWS Config setup echo "Verifying delivery channel..." VERIFY_CHANNEL=$(aws configservice describe-delivery-channels) check_command "$VERIFY_CHANNEL" echo "$VERIFY_CHANNEL" echo "Verifying configuration recorder..." VERIFY_RECORDER=$(aws configservice describe-configuration-recorders) check_command "$VERIFY_RECORDER" echo "$VERIFY_RECORDER" echo "Verifying configuration recorder status..." VERIFY_STATUS=$(aws configservice describe-configuration-recorder-status) check_command "$VERIFY_STATUS" echo "$VERIFY_STATUS" # Display created resources display_resources # Ask if user wants to clean up resources echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Cleaning up resources..." cleanup_resources echo "Cleanup completed." else echo "Resources will not be cleaned up. You can manually clean them up later." fi echo "Script completed successfully!"

The following code example shows how to:

  • Create an application

  • Enable push notification channels

  • Send a push notification

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS End User Messaging Push Getting Started Script # This script creates an AWS End User Messaging Push application and demonstrates # how to enable push notification channels and send a test message. # # Prerequisites: # - AWS CLI installed and configured # - Appropriate IAM permissions for Pinpoint operations # # Usage: ./2-cli-script-final-working.sh [--auto-cleanup] # Check for auto-cleanup flag AUTO_CLEANUP=false if [[ "${1:-}" == "--auto-cleanup" ]]; then AUTO_CLEANUP=true fi # Set up logging LOG_FILE="aws-end-user-messaging-push-script-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS End User Messaging Push setup script..." echo "Logging to $LOG_FILE" echo "Timestamp: $(date)" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 local ignore_error=${3:-false} if echo "$output" | grep -qi "error\|exception\|fail"; then echo "ERROR: Command failed: $cmd" echo "Error details: $output" if [ "$ignore_error" = "true" ]; then echo "Ignoring error and continuing..." return 1 else cleanup_on_error exit 1 fi fi return 0 } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Cleaning up resources..." if [ -n "${APP_ID:-}" ]; then echo "Deleting application with ID: $APP_ID" aws pinpoint delete-app --application-id "$APP_ID" 2>/dev/null || echo "Failed to delete application" fi # Clean up any created files rm -f gcm-message.json apns-message.json echo "Cleanup completed." } # Function to validate AWS CLI is configured validate_aws_cli() { echo "Validating AWS CLI configuration..." # Check if AWS CLI is installed if ! command -v aws &> /dev/null; then echo "ERROR: AWS CLI is not installed. Please install it first." echo "Visit: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" exit 1 fi # Check AWS CLI version AWS_VERSION=$(aws --version 2>&1 | head -n1) echo "AWS CLI version: $AWS_VERSION" # Check if AWS CLI is configured if ! aws sts get-caller-identity &> /dev/null; then echo "ERROR: AWS CLI is not configured or credentials are invalid." echo "Please run 'aws configure' to set up your credentials." exit 1 fi # Get current AWS identity and region CALLER_IDENTITY=$(aws sts get-caller-identity) CURRENT_REGION=$(aws configure get region 2>/dev/null || echo "us-east-1") echo "AWS CLI configured for:" echo "$CALLER_IDENTITY" echo "Current region: $CURRENT_REGION" echo "" } # Function to check if jq is available for JSON parsing check_json_tools() { if command -v jq &> /dev/null; then USE_JQ=true echo "jq is available for JSON parsing" else USE_JQ=false echo "jq is not available, using grep for JSON parsing" echo "Consider installing jq for better JSON handling: https://stedolan.github.io/jq/" fi } # Function to extract JSON values extract_json_value() { local json=$1 local key=$2 if [ "$USE_JQ" = "true" ]; then echo "$json" | jq -r ".$key" else # Fallback to grep method echo "$json" | grep -o "\"$key\": \"[^\"]*" | cut -d'"' -f4 | head -n1 fi } # Function to validate required IAM permissions validate_permissions() { echo "Validating IAM permissions..." # Test basic Pinpoint permissions if ! aws pinpoint get-apps &> /dev/null; then echo "WARNING: Unable to list Pinpoint applications. Please ensure you have the following IAM permissions:" echo "- mobiletargeting:GetApps" echo "- mobiletargeting:CreateApp" echo "- mobiletargeting:DeleteApp" echo "- mobiletargeting:UpdateGcmChannel" echo "- mobiletargeting:UpdateApnsChannel" echo "- mobiletargeting:SendMessages" echo "" echo "Continuing anyway..." else echo "Basic Pinpoint permissions validated." fi } # Validate prerequisites validate_aws_cli check_json_tools validate_permissions # Generate a random suffix for resource names to avoid conflicts RANDOM_SUFFIX=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | fold -w 8 | head -n1) APP_NAME="PushNotificationApp-${RANDOM_SUFFIX}" echo "Creating application with name: $APP_NAME" # Step 1: Create an application echo "Executing: aws pinpoint create-app --create-application-request Name=${APP_NAME}" CREATE_APP_OUTPUT=$(aws pinpoint create-app --create-application-request "Name=${APP_NAME}" 2>&1) check_error "$CREATE_APP_OUTPUT" "create-app" echo "Application created successfully:" echo "$CREATE_APP_OUTPUT" # Extract the application ID from the output if [ "$USE_JQ" = "true" ]; then APP_ID=$(echo "$CREATE_APP_OUTPUT" | jq -r '.ApplicationResponse.Id') else APP_ID=$(echo "$CREATE_APP_OUTPUT" | grep -o '"Id": "[^"]*' | cut -d'"' -f4 | head -n1) fi if [ -z "$APP_ID" ] || [ "$APP_ID" = "null" ]; then echo "ERROR: Failed to extract application ID from output" echo "Output was: $CREATE_APP_OUTPUT" exit 1 fi echo "Application ID: $APP_ID" # Create a resources list to track what we've created RESOURCES=("Application: $APP_ID") # Step 2: Enable FCM (GCM) channel with a sample API key echo "" echo "===========================================" echo "ENABLING FCM (GCM) CHANNEL" echo "===========================================" echo "Note: This is using a placeholder API key for demonstration purposes only." echo "In a production environment, you should use your actual FCM API key from Firebase Console." echo "" echo "IMPORTANT: The following command will likely fail because we're using a placeholder API key." echo "This is expected behavior for this demonstration script." echo "Executing: aws pinpoint update-gcm-channel --application-id $APP_ID --gcm-channel-request ..." UPDATE_GCM_OUTPUT=$(aws pinpoint update-gcm-channel \ --application-id "$APP_ID" \ --gcm-channel-request '{"Enabled": true, "ApiKey": "sample-fcm-api-key-for-demo-only"}' 2>&1) # We'll ignore this specific error since we're using a placeholder API key if check_error "$UPDATE_GCM_OUTPUT" "update-gcm-channel" "true"; then echo "FCM channel enabled successfully:" echo "$UPDATE_GCM_OUTPUT" RESOURCES+=("GCM Channel for application: $APP_ID") else echo "As expected, FCM channel update failed with the placeholder API key." echo "Error details: $UPDATE_GCM_OUTPUT" echo "" echo "To enable FCM in production:" echo "1. Go to Firebase Console (https://console.firebase.google.com/)" echo "2. Create or select your project" echo "3. Go to Project Settings > Cloud Messaging" echo "4. Copy the Server Key" echo "5. Replace 'sample-fcm-api-key-for-demo-only' with your actual Server Key" fi # Step 3: Try to enable APNS channel (this will also fail without real certificates) echo "" echo "===========================================" echo "ENABLING APNS CHANNEL (OPTIONAL)" echo "===========================================" echo "Attempting to enable APNS channel with placeholder certificate..." echo "This will also fail without real APNS certificates, which is expected." # Create a placeholder APNS configuration echo "Executing: aws pinpoint update-apns-channel --application-id $APP_ID --apns-channel-request ..." UPDATE_APNS_OUTPUT=$(aws pinpoint update-apns-channel \ --application-id "$APP_ID" \ --apns-channel-request '{"Enabled": true, "Certificate": "placeholder-certificate", "PrivateKey": "placeholder-private-key"}' 2>&1) if check_error "$UPDATE_APNS_OUTPUT" "update-apns-channel" "true"; then echo "APNS channel enabled successfully:" echo "$UPDATE_APNS_OUTPUT" RESOURCES+=("APNS Channel for application: $APP_ID") else echo "As expected, APNS channel update failed with placeholder certificates." echo "Error details: $UPDATE_APNS_OUTPUT" echo "" echo "To enable APNS in production:" echo "1. Generate APNS certificates from Apple Developer Console" echo "2. Convert certificates to PEM format" echo "3. Use the actual certificate and private key in the update-apns-channel command" fi # Step 4: Create message files for different platforms echo "" echo "===========================================" echo "CREATING MESSAGE FILES" echo "===========================================" # Create FCM message file echo "Creating FCM message file..." cat > gcm-message.json << 'EOF' { "Addresses": { "SAMPLE-DEVICE-TOKEN-FCM": { "ChannelType": "GCM" } }, "MessageConfiguration": { "GCMMessage": { "Action": "OPEN_APP", "Body": "Hello from AWS End User Messaging Push! This is an FCM notification.", "Priority": "normal", "SilentPush": false, "Title": "My First FCM Push Notification", "TimeToLive": 30, "Data": { "key1": "value1", "key2": "value2" } } } } EOF # Create APNS message file echo "Creating APNS message file..." cat > apns-message.json << 'EOF' { "Addresses": { "SAMPLE-DEVICE-TOKEN-APNS": { "ChannelType": "APNS" } }, "MessageConfiguration": { "APNSMessage": { "Action": "OPEN_APP", "Body": "Hello from AWS End User Messaging Push! This is an APNS notification.", "Priority": "normal", "SilentPush": false, "Title": "My First APNS Push Notification", "TimeToLive": 30, "Badge": 1, "Sound": "default" } } } EOF echo "Message files created:" echo "- gcm-message.json (for FCM/Android)" echo "- apns-message.json (for APNS/iOS)" echo "" echo "Note: These messages use placeholder device tokens and will not actually be delivered." echo "To send real messages, you would need to replace the sample device tokens with actual ones." # Step 5: Demonstrate how to send messages (this will fail with placeholder tokens) echo "" echo "===========================================" echo "DEMONSTRATING MESSAGE SENDING" echo "===========================================" echo "Attempting to send FCM message (will fail with placeholder token)..." echo "Executing: aws pinpoint send-messages --application-id $APP_ID --message-request file://gcm-message.json" SEND_FCM_OUTPUT=$(aws pinpoint send-messages \ --application-id "$APP_ID" \ --message-request file://gcm-message.json 2>&1) if check_error "$SEND_FCM_OUTPUT" "send-messages (FCM)" "true"; then echo "FCM message sent successfully:" echo "$SEND_FCM_OUTPUT" else echo "As expected, FCM message sending failed with placeholder token." echo "Error details: $SEND_FCM_OUTPUT" fi echo "" echo "Attempting to send APNS message (will fail with placeholder token)..." echo "Executing: aws pinpoint send-messages --application-id $APP_ID --message-request file://apns-message.json" SEND_APNS_OUTPUT=$(aws pinpoint send-messages \ --application-id "$APP_ID" \ --message-request file://apns-message.json 2>&1) if check_error "$SEND_APNS_OUTPUT" "send-messages (APNS)" "true"; then echo "APNS message sent successfully:" echo "$SEND_APNS_OUTPUT" else echo "As expected, APNS message sending failed with placeholder token." echo "Error details: $SEND_APNS_OUTPUT" fi # Step 6: Show application details echo "" echo "===========================================" echo "APPLICATION DETAILS" echo "===========================================" echo "Retrieving application details..." echo "Executing: aws pinpoint get-app --application-id $APP_ID" GET_APP_OUTPUT=$(aws pinpoint get-app --application-id "$APP_ID" 2>&1) if check_error "$GET_APP_OUTPUT" "get-app"; then echo "Application details:" echo "$GET_APP_OUTPUT" fi # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" for resource in "${RESOURCES[@]}"; do echo "- $resource" done echo "" echo "Files created:" echo "- gcm-message.json" echo "- apns-message.json" echo "- $LOG_FILE" # Cleanup prompt with proper input handling echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "This script created AWS resources that may incur charges." if [ "$AUTO_CLEANUP" = "true" ]; then echo "Auto-cleanup enabled. Cleaning up resources..." CLEANUP_CHOICE="y" else echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE fi if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "" echo "Cleaning up resources..." echo "Deleting application with ID: $APP_ID" echo "Executing: aws pinpoint delete-app --application-id $APP_ID" DELETE_APP_OUTPUT=$(aws pinpoint delete-app --application-id "$APP_ID" 2>&1) if check_error "$DELETE_APP_OUTPUT" "delete-app" "true"; then echo "Application deleted successfully." else echo "Failed to delete application. You may need to delete it manually:" echo "aws pinpoint delete-app --application-id $APP_ID" fi echo "Deleting message files..." rm -f gcm-message.json apns-message.json echo "Cleanup completed successfully." echo "Log file ($LOG_FILE) has been preserved for reference." else echo "" echo "Skipping cleanup. Resources will remain in your AWS account." echo "" echo "To manually delete the application later, run:" echo "aws pinpoint delete-app --application-id $APP_ID" echo "" echo "To delete the message files, run:" echo "rm -f gcm-message.json apns-message.json" fi echo "" echo "===========================================" echo "SCRIPT COMPLETED SUCCESSFULLY" echo "===========================================" echo "This script demonstrated:" echo "1. Creating an AWS End User Messaging Push application" echo "2. Attempting to enable FCM and APNS channels (with placeholder credentials)" echo "3. Creating message templates for different platforms" echo "4. Demonstrating message sending commands (with placeholder tokens)" echo "5. Retrieving application details" echo "6. Proper cleanup of resources" echo "" echo "For production use:" echo "- Replace placeholder API keys with real FCM server keys" echo "- Replace placeholder certificates with real APNS certificates" echo "- Replace placeholder device tokens with real device tokens" echo "- Implement proper error handling for your use case" echo "- Consider using AWS IAM roles instead of long-term credentials" echo "" echo "Log file: $LOG_FILE" echo "Script completed at: $(date)"

The following code example shows how to:

  • Create IoT resources

  • Configure your device

  • Run the sample application

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS IoT Core Getting Started Script # This script creates AWS IoT resources, configures a device, and runs a sample application # Set up logging LOG_FILE="iot-core-setup.log" echo "Starting AWS IoT Core setup at $(date)" > $LOG_FILE # Function to log commands and their outputs log_cmd() { echo "$(date): Running command: $1" >> $LOG_FILE eval "$1" 2>&1 | tee -a $LOG_FILE return ${PIPESTATUS[0]} } # Function to check for errors check_error() { if [ $1 -ne 0 ]; then echo "ERROR: Command failed with exit code $1" | tee -a $LOG_FILE echo "Please check the log file $LOG_FILE for details" | tee -a $LOG_FILE cleanup_on_error exit $1 fi } # Function to cleanup resources on error cleanup_on_error() { echo "Error encountered. Attempting to clean up resources..." | tee -a $LOG_FILE echo "Resources created:" | tee -a $LOG_FILE if [ ! -z "$CERTIFICATE_ARN" ]; then echo "Certificate ARN: $CERTIFICATE_ARN" | tee -a $LOG_FILE if [ ! -z "$POLICY_NAME" ]; then log_cmd "aws iot detach-policy --policy-name $POLICY_NAME --target $CERTIFICATE_ARN" fi if [ ! -z "$THING_NAME" ]; then log_cmd "aws iot detach-thing-principal --thing-name $THING_NAME --principal $CERTIFICATE_ARN" fi if [ ! -z "$CERTIFICATE_ID" ]; then log_cmd "aws iot update-certificate --certificate-id $CERTIFICATE_ID --new-status INACTIVE" log_cmd "aws iot delete-certificate --certificate-id $CERTIFICATE_ID" fi fi if [ ! -z "$THING_NAME" ]; then echo "Thing Name: $THING_NAME" | tee -a $LOG_FILE log_cmd "aws iot delete-thing --thing-name $THING_NAME" fi if [ ! -z "$POLICY_NAME" ]; then echo "Policy Name: $POLICY_NAME" | tee -a $LOG_FILE log_cmd "aws iot delete-policy --policy-name $POLICY_NAME" fi if [ ! -z "$SHARED_POLICY_NAME" ]; then echo "Shared Policy Name: $SHARED_POLICY_NAME" | tee -a $LOG_FILE log_cmd "aws iot delete-policy --policy-name $SHARED_POLICY_NAME" fi } # Generate unique identifiers RANDOM_SUFFIX=$(openssl rand -hex 4) THING_NAME="MyIoTThing-${RANDOM_SUFFIX}" POLICY_NAME="MyIoTPolicy-${RANDOM_SUFFIX}" SHARED_POLICY_NAME="SharedSubPolicy-${RANDOM_SUFFIX}" CERTS_DIR="$HOME/certs" echo "==================================================" | tee -a $LOG_FILE echo "AWS IoT Core Getting Started" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "This script will:" | tee -a $LOG_FILE echo "1. Create AWS IoT resources (policy, thing, certificate)" | tee -a $LOG_FILE echo "2. Configure your device" | tee -a $LOG_FILE echo "3. Set up for running the sample application" | tee -a $LOG_FILE echo "" | tee -a $LOG_FILE echo "Thing Name: $THING_NAME" | tee -a $LOG_FILE echo "Policy Name: $POLICY_NAME" | tee -a $LOG_FILE echo "Certificates Directory: $CERTS_DIR" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "" | tee -a $LOG_FILE # Get AWS account ID echo "Getting AWS account ID..." | tee -a $LOG_FILE ACCOUNT_ID=$(log_cmd "aws sts get-caller-identity --query Account --output text") check_error $? # Get AWS region echo "Getting AWS region..." | tee -a $LOG_FILE REGION=$(log_cmd "aws configure get region") check_error $? if [ -z "$REGION" ]; then echo "AWS region not configured. Please run 'aws configure' to set your region." | tee -a $LOG_FILE exit 1 fi echo "Using AWS Account ID: $ACCOUNT_ID and Region: $REGION" | tee -a $LOG_FILE # Step 1: Create AWS IoT Resources echo "" | tee -a $LOG_FILE echo "Step 1: Creating AWS IoT Resources..." | tee -a $LOG_FILE # Create IoT policy echo "Creating IoT policy document..." | tee -a $LOG_FILE cat > iot-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iot:Connect" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:client/test-*" ] }, { "Effect": "Allow", "Action": [ "iot:Publish", "iot:Receive" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:topic/test/topic" ] }, { "Effect": "Allow", "Action": [ "iot:Subscribe" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:topicfilter/test/topic" ] } ] } EOF echo "Creating IoT policy: $POLICY_NAME..." | tee -a $LOG_FILE log_cmd "aws iot create-policy --policy-name $POLICY_NAME --policy-document file://iot-policy.json" check_error $? # Create IoT thing echo "Creating IoT thing: $THING_NAME..." | tee -a $LOG_FILE log_cmd "aws iot create-thing --thing-name $THING_NAME" check_error $? # Create directory for certificates echo "Creating certificates directory..." | tee -a $LOG_FILE log_cmd "mkdir -p $CERTS_DIR" check_error $? # Create keys and certificate echo "Creating keys and certificate..." | tee -a $LOG_FILE CERT_OUTPUT=$(log_cmd "aws iot create-keys-and-certificate --set-as-active --certificate-pem-outfile $CERTS_DIR/device.pem.crt --public-key-outfile $CERTS_DIR/public.pem.key --private-key-outfile $CERTS_DIR/private.pem.key") check_error $? # Extract certificate ARN and ID CERTIFICATE_ARN=$(echo "$CERT_OUTPUT" | grep "certificateArn" | cut -d'"' -f4) CERTIFICATE_ID=$(echo "$CERTIFICATE_ARN" | cut -d/ -f2) if [ -z "$CERTIFICATE_ARN" ] || [ -z "$CERTIFICATE_ID" ]; then echo "Failed to extract certificate ARN or ID" | tee -a $LOG_FILE cleanup_on_error exit 1 fi echo "Certificate ARN: $CERTIFICATE_ARN" | tee -a $LOG_FILE echo "Certificate ID: $CERTIFICATE_ID" | tee -a $LOG_FILE # Attach policy to certificate echo "Attaching policy to certificate..." | tee -a $LOG_FILE log_cmd "aws iot attach-policy --policy-name $POLICY_NAME --target $CERTIFICATE_ARN" check_error $? # Attach certificate to thing echo "Attaching certificate to thing..." | tee -a $LOG_FILE log_cmd "aws iot attach-thing-principal --thing-name $THING_NAME --principal $CERTIFICATE_ARN" check_error $? # Download Amazon Root CA certificate echo "Downloading Amazon Root CA certificate..." | tee -a $LOG_FILE log_cmd "curl -s -o $CERTS_DIR/Amazon-root-CA-1.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem" check_error $? # Step 2: Configure Your Device echo "" | tee -a $LOG_FILE echo "Step 2: Configuring Your Device..." | tee -a $LOG_FILE # Check if Git is installed echo "Checking if Git is installed..." | tee -a $LOG_FILE if ! command -v git &> /dev/null; then echo "Git is not installed. Please install Git and run this script again." | tee -a $LOG_FILE cleanup_on_error exit 1 fi # Check if Python is installed echo "Checking if Python is installed..." | tee -a $LOG_FILE if ! command -v python3 &> /dev/null; then echo "Python 3 is not installed. Please install Python 3 and run this script again." | tee -a $LOG_FILE cleanup_on_error exit 1 fi # Install AWS IoT Device SDK for Python echo "Installing AWS IoT Device SDK for Python..." | tee -a $LOG_FILE log_cmd "python3 -m pip install awsiotsdk" check_error $? # Clone the AWS IoT Device SDK for Python repository echo "Cloning AWS IoT Device SDK for Python repository..." | tee -a $LOG_FILE if [ ! -d "$HOME/aws-iot-device-sdk-python-v2" ]; then log_cmd "cd $HOME && git clone https://github.com/aws/aws-iot-device-sdk-python-v2.git" check_error $? else echo "AWS IoT Device SDK for Python repository already exists." | tee -a $LOG_FILE fi # Step 3: Get AWS IoT Endpoint echo "" | tee -a $LOG_FILE echo "Step 3: Getting AWS IoT Endpoint..." | tee -a $LOG_FILE IOT_ENDPOINT=$(log_cmd "aws iot describe-endpoint --endpoint-type iot:Data-ATS --query endpointAddress --output text") check_error $? echo "AWS IoT Endpoint: $IOT_ENDPOINT" | tee -a $LOG_FILE # Create a shared subscription policy (optional) echo "" | tee -a $LOG_FILE echo "Creating shared subscription policy (optional)..." | tee -a $LOG_FILE cat > shared-sub-policy.json << EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iot:Connect" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:client/*" ] }, { "Effect": "Allow", "Action": [ "iot:Publish", "iot:Receive" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:topic/test/topic" ] }, { "Effect": "Allow", "Action": [ "iot:Subscribe" ], "Resource": [ "arn:aws:iot:$REGION:$ACCOUNT_ID:topicfilter/test/topic", "arn:aws:iot:$REGION:$ACCOUNT_ID:topicfilter/\$share/*/test/topic" ] } ] } EOF log_cmd "aws iot create-policy --policy-name $SHARED_POLICY_NAME --policy-document file://shared-sub-policy.json" check_error $? log_cmd "aws iot attach-policy --policy-name $SHARED_POLICY_NAME --target $CERTIFICATE_ARN" check_error $? # Summary of created resources echo "" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "Setup Complete! Resources Created:" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "Thing Name: $THING_NAME" | tee -a $LOG_FILE echo "Policy Name: $POLICY_NAME" | tee -a $LOG_FILE echo "Shared Subscription Policy Name: $SHARED_POLICY_NAME" | tee -a $LOG_FILE echo "Certificate ID: $CERTIFICATE_ID" | tee -a $LOG_FILE echo "Certificate ARN: $CERTIFICATE_ARN" | tee -a $LOG_FILE echo "Certificate Files Location: $CERTS_DIR" | tee -a $LOG_FILE echo "AWS IoT Endpoint: $IOT_ENDPOINT" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE # Instructions for running the sample application echo "" | tee -a $LOG_FILE echo "To run the sample application, execute:" | tee -a $LOG_FILE echo "cd $HOME/aws-iot-device-sdk-python-v2/samples" | tee -a $LOG_FILE echo "python3 pubsub.py \\" | tee -a $LOG_FILE echo " --endpoint $IOT_ENDPOINT \\" | tee -a $LOG_FILE echo " --ca_file $CERTS_DIR/Amazon-root-CA-1.pem \\" | tee -a $LOG_FILE echo " --cert $CERTS_DIR/device.pem.crt \\" | tee -a $LOG_FILE echo " --key $CERTS_DIR/private.pem.key" | tee -a $LOG_FILE echo "" | tee -a $LOG_FILE echo "To run the shared subscription example, execute:" | tee -a $LOG_FILE echo "cd $HOME/aws-iot-device-sdk-python-v2/samples" | tee -a $LOG_FILE echo "python3 mqtt5_shared_subscription.py \\" | tee -a $LOG_FILE echo " --endpoint $IOT_ENDPOINT \\" | tee -a $LOG_FILE echo " --ca_file $CERTS_DIR/Amazon-root-CA-1.pem \\" | tee -a $LOG_FILE echo " --cert $CERTS_DIR/device.pem.crt \\" | tee -a $LOG_FILE echo " --key $CERTS_DIR/private.pem.key \\" | tee -a $LOG_FILE echo " --group_identifier consumer" | tee -a $LOG_FILE # Ask if user wants to clean up resources echo "" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "CLEANUP CONFIRMATION" | tee -a $LOG_FILE echo "==================================================" | tee -a $LOG_FILE echo "Do you want to clean up all created resources? (y/n): " | tee -a $LOG_FILE read -r CLEANUP_CHOICE if [[ $CLEANUP_CHOICE =~ ^[Yy]$ ]]; then echo "Cleaning up resources..." | tee -a $LOG_FILE # Detach policies from certificate echo "Detaching policies from certificate..." | tee -a $LOG_FILE log_cmd "aws iot detach-policy --policy-name $POLICY_NAME --target $CERTIFICATE_ARN" log_cmd "aws iot detach-policy --policy-name $SHARED_POLICY_NAME --target $CERTIFICATE_ARN" # Detach certificate from thing echo "Detaching certificate from thing..." | tee -a $LOG_FILE log_cmd "aws iot detach-thing-principal --thing-name $THING_NAME --principal $CERTIFICATE_ARN" # Update certificate status to INACTIVE echo "Setting certificate to inactive..." | tee -a $LOG_FILE log_cmd "aws iot update-certificate --certificate-id $CERTIFICATE_ID --new-status INACTIVE" # Delete certificate echo "Deleting certificate..." | tee -a $LOG_FILE log_cmd "aws iot delete-certificate --certificate-id $CERTIFICATE_ID" # Delete thing echo "Deleting thing..." | tee -a $LOG_FILE log_cmd "aws iot delete-thing --thing-name $THING_NAME" # Delete policies echo "Deleting policies..." | tee -a $LOG_FILE log_cmd "aws iot delete-policy --policy-name $POLICY_NAME" log_cmd "aws iot delete-policy --policy-name $SHARED_POLICY_NAME" echo "Cleanup complete!" | tee -a $LOG_FILE else echo "Resources were not cleaned up. You can manually clean them up later." | tee -a $LOG_FILE echo "To clean up resources, run the following commands:" | tee -a $LOG_FILE echo "aws iot detach-policy --policy-name $POLICY_NAME --target $CERTIFICATE_ARN" | tee -a $LOG_FILE echo "aws iot detach-policy --policy-name $SHARED_POLICY_NAME --target $CERTIFICATE_ARN" | tee -a $LOG_FILE echo "aws iot detach-thing-principal --thing-name $THING_NAME --principal $CERTIFICATE_ARN" | tee -a $LOG_FILE echo "aws iot update-certificate --certificate-id $CERTIFICATE_ID --new-status INACTIVE" | tee -a $LOG_FILE echo "aws iot delete-certificate --certificate-id $CERTIFICATE_ID" | tee -a $LOG_FILE echo "aws iot delete-thing --thing-name $THING_NAME" | tee -a $LOG_FILE echo "aws iot delete-policy --policy-name $POLICY_NAME" | tee -a $LOG_FILE echo "aws iot delete-policy --policy-name $SHARED_POLICY_NAME" | tee -a $LOG_FILE fi echo "" | tee -a $LOG_FILE echo "Script execution completed. See $LOG_FILE for details." | tee -a $LOG_FILE

The following code example shows how to:

  • Create a web ACL

  • Add a string match rule

  • Add managed rules

  • Configure logging

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS WAF Getting Started Script # This script creates a Web ACL with a string match rule and AWS Managed Rules, # associates it with a CloudFront distribution, and then cleans up all resources. # Set up logging LOG_FILE="waf-tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "===================================================" echo "AWS WAF Getting Started Tutorial" echo "===================================================" echo "This script will create AWS WAF resources and associate" echo "them with a CloudFront distribution." echo "" # Maximum number of retries for operations MAX_RETRIES=3 # Function to handle errors handle_error() { echo "ERROR: $1" echo "Check the log file for details: $LOG_FILE" cleanup_resources exit 1 } # Function to check command success check_command() { if echo "$1" | grep -i "error" > /dev/null; then handle_error "$2: $1" fi } # Function to clean up resources cleanup_resources() { echo "" echo "===================================================" echo "CLEANING UP RESOURCES" echo "===================================================" if [ -n "$DISTRIBUTION_ID" ] && [ -n "$WEB_ACL_ARN" ]; then echo "Disassociating Web ACL from CloudFront distribution..." DISASSOCIATE_RESULT=$(aws wafv2 disassociate-web-acl \ --resource-arn "arn:aws:cloudfront::$(aws sts get-caller-identity --query Account --output text):distribution/$DISTRIBUTION_ID" \ --region us-east-1 2>&1) if echo "$DISASSOCIATE_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to disassociate Web ACL: $DISASSOCIATE_RESULT" else echo "Web ACL disassociated successfully." fi fi if [ -n "$WEB_ACL_ID" ] && [ -n "$WEB_ACL_NAME" ]; then echo "Deleting Web ACL..." # Get the latest lock token before deletion GET_RESULT=$(aws wafv2 get-web-acl \ --name "$WEB_ACL_NAME" \ --scope CLOUDFRONT \ --id "$WEB_ACL_ID" \ --region us-east-1 2>&1) if echo "$GET_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to get Web ACL for deletion: $GET_RESULT" echo "You may need to manually delete the Web ACL using the AWS Console." else LATEST_TOKEN=$(echo "$GET_RESULT" | grep -o '"LockToken": "[^"]*' | cut -d'"' -f4) if [ -n "$LATEST_TOKEN" ]; then DELETE_RESULT=$(aws wafv2 delete-web-acl \ --name "$WEB_ACL_NAME" \ --scope CLOUDFRONT \ --id "$WEB_ACL_ID" \ --lock-token "$LATEST_TOKEN" \ --region us-east-1 2>&1) if echo "$DELETE_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to delete Web ACL: $DELETE_RESULT" echo "You may need to manually delete the Web ACL using the AWS Console." else echo "Web ACL deleted successfully." fi else echo "Warning: Could not extract lock token for deletion. You may need to manually delete the Web ACL." fi fi fi echo "Cleanup process completed." } # Generate a random identifier for resource names RANDOM_ID=$(openssl rand -hex 4) WEB_ACL_NAME="MyWebACL-${RANDOM_ID}" METRIC_NAME="MyWebACLMetrics-${RANDOM_ID}" echo "Using Web ACL name: $WEB_ACL_NAME" # Step 1: Create a Web ACL echo "" echo "===================================================" echo "STEP 1: Creating Web ACL" echo "===================================================" CREATE_RESULT=$(aws wafv2 create-web-acl \ --name "$WEB_ACL_NAME" \ --scope "CLOUDFRONT" \ --default-action Allow={} \ --visibility-config "SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=$METRIC_NAME" \ --region us-east-1 2>&1) check_command "$CREATE_RESULT" "Failed to create Web ACL" # Extract Web ACL ID, ARN, and Lock Token from the Summary object WEB_ACL_ID=$(echo "$CREATE_RESULT" | grep -o '"Id": "[^"]*' | cut -d'"' -f4) WEB_ACL_ARN=$(echo "$CREATE_RESULT" | grep -o '"ARN": "[^"]*' | cut -d'"' -f4) LOCK_TOKEN=$(echo "$CREATE_RESULT" | grep -o '"LockToken": "[^"]*' | cut -d'"' -f4) if [ -z "$WEB_ACL_ID" ]; then handle_error "Failed to extract Web ACL ID" fi if [ -z "$LOCK_TOKEN" ]; then handle_error "Failed to extract Lock Token" fi echo "Web ACL created successfully with ID: $WEB_ACL_ID" echo "Lock Token: $LOCK_TOKEN" # Step 2: Add a String Match Rule echo "" echo "===================================================" echo "STEP 2: Adding String Match Rule" echo "===================================================" # Try to update with retries for ((i=1; i<=MAX_RETRIES; i++)); do echo "Attempt $i to add string match rule..." # Get the latest lock token before updating GET_RESULT=$(aws wafv2 get-web-acl \ --name "$WEB_ACL_NAME" \ --scope CLOUDFRONT \ --id "$WEB_ACL_ID" \ --region us-east-1 2>&1) if echo "$GET_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to get Web ACL for update: $GET_RESULT" if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to get Web ACL after $MAX_RETRIES attempts" fi sleep 2 continue fi LATEST_TOKEN=$(echo "$GET_RESULT" | grep -o '"LockToken": "[^"]*' | cut -d'"' -f4) if [ -z "$LATEST_TOKEN" ]; then echo "Warning: Could not extract lock token for update" if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to extract lock token after $MAX_RETRIES attempts" fi sleep 2 continue fi echo "Using lock token: $LATEST_TOKEN" UPDATE_RESULT=$(aws wafv2 update-web-acl \ --name "$WEB_ACL_NAME" \ --scope "CLOUDFRONT" \ --id "$WEB_ACL_ID" \ --lock-token "$LATEST_TOKEN" \ --default-action Allow={} \ --rules '[{ "Name": "UserAgentRule", "Priority": 0, "Statement": { "ByteMatchStatement": { "SearchString": "MyAgent", "FieldToMatch": { "SingleHeader": { "Name": "user-agent" } }, "TextTransformations": [ { "Priority": 0, "Type": "NONE" } ], "PositionalConstraint": "EXACTLY" } }, "Action": { "Count": {} }, "VisibilityConfig": { "SampledRequestsEnabled": true, "CloudWatchMetricsEnabled": true, "MetricName": "UserAgentRuleMetric" } }]' \ --visibility-config "SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=$METRIC_NAME" \ --region us-east-1 2>&1) if echo "$UPDATE_RESULT" | grep -i "WAFOptimisticLockException" > /dev/null; then echo "Optimistic lock exception encountered. Will retry with new lock token." if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to add string match rule after $MAX_RETRIES attempts: $UPDATE_RESULT" fi sleep 2 continue elif echo "$UPDATE_RESULT" | grep -i "error" > /dev/null; then handle_error "Failed to add string match rule: $UPDATE_RESULT" else # Success echo "String match rule added successfully." break fi done # Step 3: Add AWS Managed Rules echo "" echo "===================================================" echo "STEP 3: Adding AWS Managed Rules" echo "===================================================" # Try to update with retries for ((i=1; i<=MAX_RETRIES; i++)); do echo "Attempt $i to add AWS Managed Rules..." # Get the latest lock token before updating GET_RESULT=$(aws wafv2 get-web-acl \ --name "$WEB_ACL_NAME" \ --scope CLOUDFRONT \ --id "$WEB_ACL_ID" \ --region us-east-1 2>&1) if echo "$GET_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to get Web ACL for update: $GET_RESULT" if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to get Web ACL after $MAX_RETRIES attempts" fi sleep 2 continue fi LATEST_TOKEN=$(echo "$GET_RESULT" | grep -o '"LockToken": "[^"]*' | cut -d'"' -f4) if [ -z "$LATEST_TOKEN" ]; then echo "Warning: Could not extract lock token for update" if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to extract lock token after $MAX_RETRIES attempts" fi sleep 2 continue fi echo "Using lock token: $LATEST_TOKEN" UPDATE_RESULT=$(aws wafv2 update-web-acl \ --name "$WEB_ACL_NAME" \ --scope "CLOUDFRONT" \ --id "$WEB_ACL_ID" \ --lock-token "$LATEST_TOKEN" \ --default-action Allow={} \ --rules '[{ "Name": "UserAgentRule", "Priority": 0, "Statement": { "ByteMatchStatement": { "SearchString": "MyAgent", "FieldToMatch": { "SingleHeader": { "Name": "user-agent" } }, "TextTransformations": [ { "Priority": 0, "Type": "NONE" } ], "PositionalConstraint": "EXACTLY" } }, "Action": { "Count": {} }, "VisibilityConfig": { "SampledRequestsEnabled": true, "CloudWatchMetricsEnabled": true, "MetricName": "UserAgentRuleMetric" } }, { "Name": "AWS-AWSManagedRulesCommonRuleSet", "Priority": 1, "Statement": { "ManagedRuleGroupStatement": { "VendorName": "AWS", "Name": "AWSManagedRulesCommonRuleSet", "ExcludedRules": [] } }, "OverrideAction": { "Count": {} }, "VisibilityConfig": { "SampledRequestsEnabled": true, "CloudWatchMetricsEnabled": true, "MetricName": "AWS-AWSManagedRulesCommonRuleSet" } }]' \ --visibility-config "SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=$METRIC_NAME" \ --region us-east-1 2>&1) if echo "$UPDATE_RESULT" | grep -i "WAFOptimisticLockException" > /dev/null; then echo "Optimistic lock exception encountered. Will retry with new lock token." if [ "$i" -eq "$MAX_RETRIES" ]; then handle_error "Failed to add AWS Managed Rules after $MAX_RETRIES attempts: $UPDATE_RESULT" fi sleep 2 continue elif echo "$UPDATE_RESULT" | grep -i "error" > /dev/null; then handle_error "Failed to add AWS Managed Rules: $UPDATE_RESULT" else # Success echo "AWS Managed Rules added successfully." break fi done # Step 4: List CloudFront distributions echo "" echo "===================================================" echo "STEP 4: Listing CloudFront Distributions" echo "===================================================" CF_RESULT=$(aws cloudfront list-distributions --query "DistributionList.Items[*].{Id:Id,DomainName:DomainName}" --output table 2>&1) if echo "$CF_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to list CloudFront distributions: $CF_RESULT" echo "Continuing without CloudFront association." else echo "$CF_RESULT" # Ask user to select a CloudFront distribution echo "" echo "===================================================" echo "STEP 5: Associate Web ACL with CloudFront Distribution" echo "===================================================" echo "Enter the ID of the CloudFront distribution to associate with the Web ACL:" echo "(If you don't have a CloudFront distribution, press Enter to skip this step)" read -r DISTRIBUTION_ID if [ -n "$DISTRIBUTION_ID" ]; then ASSOCIATE_RESULT=$(aws wafv2 associate-web-acl \ --web-acl-arn "$WEB_ACL_ARN" \ --resource-arn "arn:aws:cloudfront::$(aws sts get-caller-identity --query Account --output text):distribution/$DISTRIBUTION_ID" \ --region us-east-1 2>&1) if echo "$ASSOCIATE_RESULT" | grep -i "error" > /dev/null; then echo "Warning: Failed to associate Web ACL with CloudFront distribution: $ASSOCIATE_RESULT" echo "Continuing without CloudFront association." DISTRIBUTION_ID="" else echo "Web ACL associated with CloudFront distribution successfully." fi else echo "Skipping association with CloudFront distribution." fi fi # Display summary of created resources echo "" echo "===================================================" echo "RESOURCE SUMMARY" echo "===================================================" echo "Web ACL Name: $WEB_ACL_NAME" echo "Web ACL ID: $WEB_ACL_ID" echo "Web ACL ARN: $WEB_ACL_ARN" if [ -n "$DISTRIBUTION_ID" ]; then echo "Associated CloudFront Distribution: $DISTRIBUTION_ID" fi echo "" # Ask user if they want to clean up resources echo "===================================================" echo "CLEANUP CONFIRMATION" echo "===================================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy] ]]; then cleanup_resources else echo "" echo "Resources have NOT been cleaned up. You can manually clean them up later." echo "To clean up resources manually, run the following commands:" if [ -n "$DISTRIBUTION_ID" ]; then echo "aws wafv2 disassociate-web-acl --resource-arn \"arn:aws:cloudfront::$(aws sts get-caller-identity --query Account --output text):distribution/$DISTRIBUTION_ID\" --region us-east-1" fi echo "aws wafv2 delete-web-acl --name \"$WEB_ACL_NAME\" --scope CLOUDFRONT --id \"$WEB_ACL_ID\" --lock-token \"<get-latest-token>\" --region us-east-1" echo "" echo "To get the latest lock token, run:" echo "aws wafv2 get-web-acl --name \"$WEB_ACL_NAME\" --scope CLOUDFRONT --id \"$WEB_ACL_ID\" --region us-east-1" fi echo "" echo "===================================================" echo "Tutorial completed!" echo "===================================================" echo "Log file: $LOG_FILE"

The following code example shows how to:

  • Create IAM roles

  • Create a secret in Secrets Manager

  • Update your application code

  • Update the secret

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # Script to move hardcoded secrets to AWS Secrets Manager # This script demonstrates how to create IAM roles, store a secret in AWS Secrets Manager, # and set up appropriate permissions # Set up logging LOG_FILE="secrets_manager_tutorial.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS Secrets Manager tutorial script at $(date)" echo "======================================================" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 if echo "$output" | grep -i "error" > /dev/null; then echo "ERROR: Command failed: $cmd" echo "$output" cleanup_resources exit 1 fi } # Function to generate a random identifier generate_random_id() { echo "sm$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 8 | head -n 1)" } # Function to clean up resources cleanup_resources() { echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" if [ -n "$SECRET_NAME" ]; then echo "Secret: $SECRET_NAME" fi if [ -n "$RUNTIME_ROLE_NAME" ]; then echo "IAM Role: $RUNTIME_ROLE_NAME" fi if [ -n "$ADMIN_ROLE_NAME" ]; then echo "IAM Role: $ADMIN_ROLE_NAME" fi echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Cleaning up resources..." # Delete secret if it exists if [ -n "$SECRET_NAME" ]; then echo "Deleting secret: $SECRET_NAME" aws secretsmanager delete-secret --secret-id "$SECRET_NAME" --force-delete-without-recovery fi # Detach policies and delete runtime role if it exists if [ -n "$RUNTIME_ROLE_NAME" ]; then echo "Deleting IAM role: $RUNTIME_ROLE_NAME" aws iam delete-role --role-name "$RUNTIME_ROLE_NAME" fi # Detach policies and delete admin role if it exists if [ -n "$ADMIN_ROLE_NAME" ]; then echo "Detaching policy from role: $ADMIN_ROLE_NAME" aws iam detach-role-policy --role-name "$ADMIN_ROLE_NAME" --policy-arn "arn:aws:iam::aws:policy/SecretsManagerReadWrite" echo "Deleting IAM role: $ADMIN_ROLE_NAME" aws iam delete-role --role-name "$ADMIN_ROLE_NAME" fi echo "Cleanup completed." else echo "Resources will not be deleted." fi } # Trap to ensure cleanup on script exit trap 'echo "Script interrupted. Running cleanup..."; cleanup_resources' INT TERM # Generate random identifiers for resources ADMIN_ROLE_NAME="SecretsManagerAdmin-$(generate_random_id)" RUNTIME_ROLE_NAME="RoleToRetrieveSecretAtRuntime-$(generate_random_id)" SECRET_NAME="MyAPIKey-$(generate_random_id)" echo "Using the following resource names:" echo "Admin Role: $ADMIN_ROLE_NAME" echo "Runtime Role: $RUNTIME_ROLE_NAME" echo "Secret Name: $SECRET_NAME" echo "" # Step 1: Create IAM roles echo "Creating IAM roles..." # Create the SecretsManagerAdmin role echo "Creating admin role: $ADMIN_ROLE_NAME" ADMIN_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$ADMIN_ROLE_NAME" \ --assume-role-policy-document '{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }') check_error "$ADMIN_ROLE_OUTPUT" "create-role for admin" echo "$ADMIN_ROLE_OUTPUT" # Attach the SecretsManagerReadWrite policy to the admin role echo "Attaching SecretsManagerReadWrite policy to admin role" ATTACH_POLICY_OUTPUT=$(aws iam attach-role-policy \ --role-name "$ADMIN_ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/SecretsManagerReadWrite") check_error "$ATTACH_POLICY_OUTPUT" "attach-role-policy for admin" echo "$ATTACH_POLICY_OUTPUT" # Create the RoleToRetrieveSecretAtRuntime role echo "Creating runtime role: $RUNTIME_ROLE_NAME" RUNTIME_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$RUNTIME_ROLE_NAME" \ --assume-role-policy-document '{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }') check_error "$RUNTIME_ROLE_OUTPUT" "create-role for runtime" echo "$RUNTIME_ROLE_OUTPUT" # Wait for roles to be fully created echo "Waiting for IAM roles to be fully created..." sleep 10 # Step 2: Create a secret in AWS Secrets Manager echo "Creating secret in AWS Secrets Manager..." CREATE_SECRET_OUTPUT=$(aws secretsmanager create-secret \ --name "$SECRET_NAME" \ --description "API key for my application" \ --secret-string '{"ClientID":"my_client_id","ClientSecret":"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"}') check_error "$CREATE_SECRET_OUTPUT" "create-secret" echo "$CREATE_SECRET_OUTPUT" # Get AWS account ID echo "Getting AWS account ID..." ACCOUNT_ID_OUTPUT=$(aws sts get-caller-identity --query "Account" --output text) check_error "$ACCOUNT_ID_OUTPUT" "get-caller-identity" ACCOUNT_ID=$ACCOUNT_ID_OUTPUT echo "Account ID: $ACCOUNT_ID" # Add resource policy to the secret echo "Adding resource policy to secret..." RESOURCE_POLICY=$(cat <<EOF { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::$ACCOUNT_ID:role/$RUNTIME_ROLE_NAME" }, "Action": "secretsmanager:GetSecretValue", "Resource": "*" } ] } EOF ) PUT_POLICY_OUTPUT=$(aws secretsmanager put-resource-policy \ --secret-id "$SECRET_NAME" \ --resource-policy "$RESOURCE_POLICY" \ --block-public-policy) check_error "$PUT_POLICY_OUTPUT" "put-resource-policy" echo "$PUT_POLICY_OUTPUT" # Step 3: Demonstrate retrieving the secret echo "Retrieving the secret value (for demonstration purposes)..." GET_SECRET_OUTPUT=$(aws secretsmanager get-secret-value \ --secret-id "$SECRET_NAME") check_error "$GET_SECRET_OUTPUT" "get-secret-value" echo "Secret retrieved successfully. Secret metadata:" echo "$GET_SECRET_OUTPUT" | grep -v "SecretString" # Step 4: Update the secret with new values echo "Updating the secret with new values..." UPDATE_SECRET_OUTPUT=$(aws secretsmanager update-secret \ --secret-id "$SECRET_NAME" \ --secret-string '{"ClientID":"my_new_client_id","ClientSecret":"bPxRfiCYEXAMPLEKEY/wJalrXUtnFEMI/K7MDENG"}') check_error "$UPDATE_SECRET_OUTPUT" "update-secret" echo "$UPDATE_SECRET_OUTPUT" # Step 5: Verify the updated secret echo "Verifying the updated secret..." VERIFY_SECRET_OUTPUT=$(aws secretsmanager get-secret-value \ --secret-id "$SECRET_NAME") check_error "$VERIFY_SECRET_OUTPUT" "get-secret-value for verification" echo "Updated secret retrieved successfully. Secret metadata:" echo "$VERIFY_SECRET_OUTPUT" | grep -v "SecretString" echo "" echo "======================================================" echo "Tutorial completed successfully!" echo "" echo "Summary of what we did:" echo "1. Created IAM roles for managing and retrieving secrets" echo "2. Created a secret in AWS Secrets Manager" echo "3. Added a resource policy to control access to the secret" echo "4. Retrieved the secret value (simulating application access)" echo "5. Updated the secret with new values" echo "" echo "Next steps you might want to consider:" echo "- Implement secret caching in your application" echo "- Set up automatic rotation for your secrets" echo "- Use AWS CodeGuru Reviewer to find hardcoded secrets in your code" echo "- For multi-region applications, replicate your secrets across regions" echo "" # Clean up resources cleanup_resources echo "Script completed at $(date)" exit 0

The following code example shows how to:

  • Create IAM roles

  • Create a CloudWatch alarm

  • Create an experiment template

  • Run the experiment

  • Verify the results

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS FIS CPU Stress Test Tutorial Script # This script automates the steps in the AWS FIS CPU stress test tutorial # approach using epoch time calculations that work across all Linux distributions # Set up logging LOG_FILE="fis-tutorial-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS FIS CPU Stress Test Tutorial Script" echo "Logging to $LOG_FILE" echo "==============================================" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 if echo "$output" | grep -i "error" > /dev/null; then # Ignore specific expected errors if [[ "$cmd" == *"aws fis get-experiment"* ]] && [[ "$output" == *"ConfigurationFailure"* ]]; then echo "Note: Experiment failed due to configuration issue. This is expected in some cases." return 0 fi echo "ERROR: Command failed: $cmd" echo "Output: $output" cleanup_on_error exit 1 fi } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Cleaning up resources..." if [ -n "$EXPERIMENT_ID" ]; then echo "Stopping experiment $EXPERIMENT_ID if running..." aws fis stop-experiment --id "$EXPERIMENT_ID" 2>/dev/null || true fi if [ -n "$TEMPLATE_ID" ]; then echo "Deleting experiment template $TEMPLATE_ID..." aws fis delete-experiment-template --id "$TEMPLATE_ID" || true fi if [ -n "$INSTANCE_ID" ]; then echo "Terminating EC2 instance $INSTANCE_ID..." aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" || true fi if [ -n "$ALARM_NAME" ]; then echo "Deleting CloudWatch alarm $ALARM_NAME..." aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME" || true fi if [ -n "$INSTANCE_PROFILE_NAME" ]; then echo "Removing role from instance profile..." aws iam remove-role-from-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" --role-name "$EC2_ROLE_NAME" || true echo "Deleting instance profile..." aws iam delete-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" || true fi if [ -n "$FIS_ROLE_NAME" ]; then echo "Deleting FIS role policy..." aws iam delete-role-policy --role-name "$FIS_ROLE_NAME" --policy-name "$FIS_POLICY_NAME" || true echo "Deleting FIS role..." aws iam delete-role --role-name "$FIS_ROLE_NAME" || true fi if [ -n "$EC2_ROLE_NAME" ]; then echo "Detaching policy from EC2 role..." aws iam detach-role-policy --role-name "$EC2_ROLE_NAME" --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" || true echo "Deleting EC2 role..." aws iam delete-role --role-name "$EC2_ROLE_NAME" || true fi echo "Cleanup completed." } # Generate unique identifiers for resources TIMESTAMP=$(date +%Y%m%d%H%M%S) FIS_ROLE_NAME="FISRole-${TIMESTAMP}" FIS_POLICY_NAME="FISPolicy-${TIMESTAMP}" EC2_ROLE_NAME="EC2SSMRole-${TIMESTAMP}" INSTANCE_PROFILE_NAME="EC2SSMProfile-${TIMESTAMP}" ALARM_NAME="FIS-CPU-Alarm-${TIMESTAMP}" # Track created resources CREATED_RESOURCES=() echo "Step 1: Creating IAM role for AWS FIS" # Create trust policy file for AWS FIS cat > fis-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "fis.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for FIS echo "Creating IAM role $FIS_ROLE_NAME for AWS FIS..." FIS_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$FIS_ROLE_NAME" \ --assume-role-policy-document file://fis-trust-policy.json) check_error "$FIS_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $FIS_ROLE_NAME") # Create policy document for SSM actions cat > fis-ssm-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:SendCommand", "ssm:ListCommands", "ssm:ListCommandInvocations" ], "Resource": "*" } ] } EOF # Attach policy to the role echo "Attaching policy $FIS_POLICY_NAME to role $FIS_ROLE_NAME..." FIS_POLICY_OUTPUT=$(aws iam put-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME" \ --policy-document file://fis-ssm-policy.json) check_error "$FIS_POLICY_OUTPUT" "aws iam put-role-policy" CREATED_RESOURCES+=("IAM Policy: $FIS_POLICY_NAME attached to $FIS_ROLE_NAME") echo "Step 2: Creating IAM role for EC2 instance with SSM permissions" # Create trust policy file for EC2 cat > ec2-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for EC2 echo "Creating IAM role $EC2_ROLE_NAME for EC2 instance..." EC2_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$EC2_ROLE_NAME" \ --assume-role-policy-document file://ec2-trust-policy.json) check_error "$EC2_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $EC2_ROLE_NAME") # Attach SSM policy to the EC2 role echo "Attaching AmazonSSMManagedInstanceCore policy to role $EC2_ROLE_NAME..." EC2_POLICY_OUTPUT=$(aws iam attach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore) check_error "$EC2_POLICY_OUTPUT" "aws iam attach-role-policy" CREATED_RESOURCES+=("IAM Policy: AmazonSSMManagedInstanceCore attached to $EC2_ROLE_NAME") # Create instance profile echo "Creating instance profile $INSTANCE_PROFILE_NAME..." PROFILE_OUTPUT=$(aws iam create-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$PROFILE_OUTPUT" "aws iam create-instance-profile" CREATED_RESOURCES+=("IAM Instance Profile: $INSTANCE_PROFILE_NAME") # Add role to instance profile echo "Adding role $EC2_ROLE_NAME to instance profile $INSTANCE_PROFILE_NAME..." ADD_ROLE_OUTPUT=$(aws iam add-role-to-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$ADD_ROLE_OUTPUT" "aws iam add-role-to-instance-profile" # Wait for role to propagate echo "Waiting for IAM role to propagate..." sleep 10 echo "Step 3: Launching EC2 instance" # Get the latest Amazon Linux 2 AMI ID echo "Finding latest Amazon Linux 2 AMI..." AMI_ID=$(aws ec2 describe-images \ --owners amazon \ --filters "Name=name,Values=amzn2-ami-hvm-*-x86_64-gp2" "Name=state,Values=available" \ --query "sort_by(Images, &CreationDate)[-1].ImageId" \ --output text) check_error "$AMI_ID" "aws ec2 describe-images" echo "Using AMI: $AMI_ID" # Launch EC2 instance echo "Launching EC2 instance with AMI $AMI_ID..." INSTANCE_OUTPUT=$(aws ec2 run-instances \ --image-id "$AMI_ID" \ --instance-type t2.micro \ --iam-instance-profile Name="$INSTANCE_PROFILE_NAME" \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=FIS-Test-Instance}]') check_error "$INSTANCE_OUTPUT" "aws ec2 run-instances" # Get instance ID INSTANCE_ID=$(echo "$INSTANCE_OUTPUT" | grep -i "InstanceId" | head -1 | awk -F'"' '{print $4}') if [ -z "$INSTANCE_ID" ]; then echo "Failed to get instance ID" cleanup_on_error exit 1 fi echo "Launched instance: $INSTANCE_ID" CREATED_RESOURCES+=("EC2 Instance: $INSTANCE_ID") # Enable detailed monitoring echo "Enabling detailed monitoring for instance $INSTANCE_ID..." MONITOR_OUTPUT=$(aws ec2 monitor-instances --instance-ids "$INSTANCE_ID") check_error "$MONITOR_OUTPUT" "aws ec2 monitor-instances" # Wait for instance to be running and status checks to pass echo "Waiting for instance to be ready..." aws ec2 wait instance-running --instance-ids "$INSTANCE_ID" aws ec2 wait instance-status-ok --instance-ids "$INSTANCE_ID" echo "Instance is ready" echo "Step 4: Creating CloudWatch alarm for CPU utilization" # Create CloudWatch alarm echo "Creating CloudWatch alarm $ALARM_NAME..." ALARM_OUTPUT=$(aws cloudwatch put-metric-alarm \ --alarm-name "$ALARM_NAME" \ --alarm-description "Alarm when CPU exceeds 50%" \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Maximum \ --period 60 \ --threshold 50 \ --comparison-operator GreaterThanOrEqualToThreshold \ --dimensions "Name=InstanceId,Value=$INSTANCE_ID" \ --evaluation-periods 1) check_error "$ALARM_OUTPUT" "aws cloudwatch put-metric-alarm" CREATED_RESOURCES+=("CloudWatch Alarm: $ALARM_NAME") # Get the alarm ARN echo "Getting CloudWatch alarm ARN..." ALARM_ARN_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") check_error "$ALARM_ARN_OUTPUT" "aws cloudwatch describe-alarms" ALARM_ARN=$(echo "$ALARM_ARN_OUTPUT" | grep -i "AlarmArn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ALARM_ARN" ]; then echo "Failed to get alarm ARN" cleanup_on_error exit 1 fi echo "Alarm ARN: $ALARM_ARN" # Wait for the alarm to initialize and reach OK state echo "Waiting for CloudWatch alarm to initialize (60 seconds)..." sleep 60 # Check alarm state echo "Checking alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Current alarm state: $ALARM_STATE" # If alarm is not in OK state, wait longer or generate some baseline metrics if [ "$ALARM_STATE" != "OK" ]; then echo "Alarm not in OK state. Waiting for alarm to stabilize (additional 60 seconds)..." sleep 60 # Check alarm state again ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Updated alarm state: $ALARM_STATE" if [ "$ALARM_STATE" != "OK" ]; then echo "Warning: Alarm still not in OK state. Experiment may fail to start." fi fi echo "Step 5: Creating AWS FIS experiment template" # Get the IAM role ARN echo "Getting IAM role ARN for $FIS_ROLE_NAME..." ROLE_ARN_OUTPUT=$(aws iam get-role \ --role-name "$FIS_ROLE_NAME") check_error "$ROLE_ARN_OUTPUT" "aws iam get-role" ROLE_ARN=$(echo "$ROLE_ARN_OUTPUT" | grep -i "Arn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ROLE_ARN" ]; then echo "Failed to get role ARN" cleanup_on_error exit 1 fi echo "Role ARN: $ROLE_ARN" # Get account ID and region ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" # Default to us-east-1 if region not set fi INSTANCE_ARN="arn:aws:ec2:${REGION}:${ACCOUNT_ID}:instance/${INSTANCE_ID}" echo "Instance ARN: $INSTANCE_ARN" # Create experiment template - Fixed JSON escaping issue cat > experiment-template.json << EOF { "description": "Test CPU stress predefined SSM document", "targets": { "testInstance": { "resourceType": "aws:ec2:instance", "resourceArns": ["$INSTANCE_ARN"], "selectionMode": "ALL" } }, "actions": { "runCpuStress": { "actionId": "aws:ssm:send-command", "parameters": { "documentArn": "arn:aws:ssm:$REGION::document/AWSFIS-Run-CPU-Stress", "documentParameters": "{\"DurationSeconds\":\"120\"}", "duration": "PT5M" }, "targets": { "Instances": "testInstance" } } }, "stopConditions": [ { "source": "aws:cloudwatch:alarm", "value": "$ALARM_ARN" } ], "roleArn": "$ROLE_ARN", "tags": { "Name": "FIS-CPU-Stress-Experiment" } } EOF # Create experiment template echo "Creating AWS FIS experiment template..." TEMPLATE_OUTPUT=$(aws fis create-experiment-template --cli-input-json file://experiment-template.json) check_error "$TEMPLATE_OUTPUT" "aws fis create-experiment-template" TEMPLATE_ID=$(echo "$TEMPLATE_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$TEMPLATE_ID" ]; then echo "Failed to get template ID" cleanup_on_error exit 1 fi echo "Experiment template created with ID: $TEMPLATE_ID" CREATED_RESOURCES+=("FIS Experiment Template: $TEMPLATE_ID") echo "Step 6: Starting the experiment" # Start the experiment echo "Starting AWS FIS experiment using template $TEMPLATE_ID..." EXPERIMENT_OUTPUT=$(aws fis start-experiment \ --experiment-template-id "$TEMPLATE_ID" \ --tags '{"Name": "FIS-CPU-Stress-Run"}') check_error "$EXPERIMENT_OUTPUT" "aws fis start-experiment" EXPERIMENT_ID=$(echo "$EXPERIMENT_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$EXPERIMENT_ID" ]; then echo "Failed to get experiment ID" cleanup_on_error exit 1 fi echo "Experiment started with ID: $EXPERIMENT_ID" CREATED_RESOURCES+=("FIS Experiment: $EXPERIMENT_ID") echo "Step 7: Tracking experiment progress" # Track experiment progress echo "Tracking experiment progress..." MAX_CHECKS=30 CHECK_COUNT=0 EXPERIMENT_STATE="" while [ $CHECK_COUNT -lt $MAX_CHECKS ]; do EXPERIMENT_INFO=$(aws fis get-experiment --id "$EXPERIMENT_ID") # Don't check for errors here, as we expect some experiments to fail EXPERIMENT_STATE=$(echo "$EXPERIMENT_INFO" | grep -i "status" | head -1 | awk -F'"' '{print $4}') echo "Experiment state: $EXPERIMENT_STATE" if [ "$EXPERIMENT_STATE" == "completed" ] || [ "$EXPERIMENT_STATE" == "stopped" ] || [ "$EXPERIMENT_STATE" == "failed" ]; then # Show the reason for the state REASON=$(echo "$EXPERIMENT_INFO" | grep -i "reason" | head -1 | awk -F'"' '{print $4}') if [ -n "$REASON" ]; then echo "Reason: $REASON" fi break fi echo "Waiting 10 seconds before checking again..." sleep 10 CHECK_COUNT=$((CHECK_COUNT + 1)) done if [ $CHECK_COUNT -eq $MAX_CHECKS ]; then echo "Experiment is taking longer than expected. You can check its status later using:" echo "aws fis get-experiment --id $EXPERIMENT_ID" fi echo "Step 8: Verifying experiment results" # Check CloudWatch alarm state echo "Checking CloudWatch alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms --alarm-names "$ALARM_NAME") check_error "$ALARM_STATE_OUTPUT" "aws cloudwatch describe-alarms" echo "$ALARM_STATE_OUTPUT" # Get CPU utilization metrics echo "Getting CPU utilization metrics..." END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ") # FIXED: Cross-platform compatible way to calculate time 10 minutes ago # This approach uses epoch seconds and basic arithmetic which works on all Linux distributions CURRENT_EPOCH=$(date +%s) TEN_MINUTES_AGO_EPOCH=$((CURRENT_EPOCH - 600)) START_TIME=$(date -u -d "@$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u -r "$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ") # Create metric query file cat > metric-query.json << EOF [ { "Id": "cpu", "MetricStat": { "Metric": { "Namespace": "AWS/EC2", "MetricName": "CPUUtilization", "Dimensions": [ { "Name": "InstanceId", "Value": "$INSTANCE_ID" } ] }, "Period": 60, "Stat": "Maximum" } } ] EOF METRICS_OUTPUT=$(aws cloudwatch get-metric-data \ --start-time "$START_TIME" \ --end-time "$END_TIME" \ --metric-data-queries file://metric-query.json) check_error "$METRICS_OUTPUT" "aws cloudwatch get-metric-data" echo "CPU Utilization Metrics:" echo "$METRICS_OUTPUT" # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "- $resource" done echo "===========================================" # Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup process..." # Stop experiment if still running if [ "$EXPERIMENT_STATE" != "completed" ] && [ "$EXPERIMENT_STATE" != "stopped" ] && [ "$EXPERIMENT_STATE" != "failed" ]; then echo "Stopping experiment $EXPERIMENT_ID..." STOP_OUTPUT=$(aws fis stop-experiment --id "$EXPERIMENT_ID") check_error "$STOP_OUTPUT" "aws fis stop-experiment" echo "Waiting for experiment to stop..." sleep 10 fi # Delete experiment template echo "Deleting experiment template $TEMPLATE_ID..." DELETE_TEMPLATE_OUTPUT=$(aws fis delete-experiment-template --id "$TEMPLATE_ID") check_error "$DELETE_TEMPLATE_OUTPUT" "aws fis delete-experiment-template" # Delete CloudWatch alarm echo "Deleting CloudWatch alarm $ALARM_NAME..." DELETE_ALARM_OUTPUT=$(aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME") check_error "$DELETE_ALARM_OUTPUT" "aws cloudwatch delete-alarms" # Terminate EC2 instance echo "Terminating EC2 instance $INSTANCE_ID..." TERMINATE_OUTPUT=$(aws ec2 terminate-instances --instance-ids "$INSTANCE_ID") check_error "$TERMINATE_OUTPUT" "aws ec2 terminate-instances" echo "Waiting for instance to terminate..." aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID" # Clean up IAM resources echo "Removing role from instance profile..." REMOVE_ROLE_OUTPUT=$(aws iam remove-role-from-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$REMOVE_ROLE_OUTPUT" "aws iam remove-role-from-instance-profile" echo "Deleting instance profile..." DELETE_PROFILE_OUTPUT=$(aws iam delete-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$DELETE_PROFILE_OUTPUT" "aws iam delete-instance-profile" echo "Deleting FIS role policy..." DELETE_POLICY_OUTPUT=$(aws iam delete-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME") check_error "$DELETE_POLICY_OUTPUT" "aws iam delete-role-policy" echo "Detaching policy from EC2 role..." DETACH_POLICY_OUTPUT=$(aws iam detach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore") check_error "$DETACH_POLICY_OUTPUT" "aws iam detach-role-policy" echo "Deleting FIS role..." DELETE_FIS_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$FIS_ROLE_NAME") check_error "$DELETE_FIS_ROLE_OUTPUT" "aws iam delete-role" echo "Deleting EC2 role..." DELETE_EC2_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$EC2_ROLE_NAME") check_error "$DELETE_EC2_ROLE_OUTPUT" "aws iam delete-role" # Clean up temporary files echo "Cleaning up temporary files..." rm -f fis-trust-policy.json ec2-trust-policy.json fis-ssm-policy.json experiment-template.json metric-query.json echo "Cleanup completed successfully." else echo "Cleanup skipped. Resources will remain in your AWS account." echo "You can manually clean up the resources listed above." fi echo "" echo "Script execution completed." echo "Log file: $LOG_FILE"

The following code example shows how to:

  • Create IAM permissions for Systems Manager

  • Create an IAM role for Systems Manager

  • Configure Systems Manager

  • Verify the setup

  • Clean up resources

AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS Systems Manager Setup Script # This script sets up AWS Systems Manager for a single account and region # # Version 17 fixes: # 1. Added cloudformation.amazonaws.com to the IAM role trust policy # 2. Systems Manager Quick Setup uses CloudFormation for deployments, so the role must trust CloudFormation service # Initialize log file LOG_FILE="ssm_setup_$(date +%Y%m%d_%H%M%S).log" echo "Starting AWS Systems Manager setup at $(date)" > "$LOG_FILE" # Function to log commands and their outputs with immediate terminal display log_cmd() { echo "$(date): Running command: $1" | tee -a "$LOG_FILE" local output output=$(eval "$1" 2>&1) local status=$? echo "$output" | tee -a "$LOG_FILE" return $status } # Function to check for errors in command output check_error() { local cmd_output="$1" local cmd_status="$2" local error_msg="$3" if [[ $cmd_status -ne 0 || "$cmd_output" =~ [Ee][Rr][Rr][Oo][Rr] ]]; then echo "ERROR: $error_msg" | tee -a "$LOG_FILE" echo "Command output: $cmd_output" | tee -a "$LOG_FILE" cleanup_on_error exit 1 fi } # Array to track created resources for cleanup declare -a CREATED_RESOURCES # Function to add a resource to the tracking array track_resource() { local resource_type="$1" local resource_id="$2" CREATED_RESOURCES+=("$resource_type:$resource_id") echo "Tracked resource: $resource_type:$resource_id" | tee -a "$LOG_FILE" } # Function to clean up resources on error cleanup_on_error() { echo "" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "ERROR OCCURRED - CLEANING UP RESOURCES" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "The following resources were created:" | tee -a "$LOG_FILE" # Display resources in reverse order for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do echo "${CREATED_RESOURCES[$i]}" | tee -a "$LOG_FILE" done echo "" | tee -a "$LOG_FILE" echo "Attempting to clean up resources..." | tee -a "$LOG_FILE" # Clean up resources in reverse order cleanup_resources } # Function to clean up all created resources cleanup_resources() { # Process resources in reverse order (last created, first deleted) for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do IFS=':' read -r resource_type resource_id <<< "${CREATED_RESOURCES[$i]}" echo "Deleting $resource_type: $resource_id" | tee -a "$LOG_FILE" case "$resource_type" in "IAM_POLICY") # Delete the policy (detachment should have been handled when the role was deleted) log_cmd "aws iam delete-policy --policy-arn $resource_id" || true ;; "IAM_ROLE") # Detach all policies from the role first if [[ -n "$POLICY_ARN" ]]; then log_cmd "aws iam detach-role-policy --role-name $resource_id --policy-arn $POLICY_ARN" || true fi # Delete the role log_cmd "aws iam delete-role --role-name $resource_id" || true ;; "SSM_CONFIG_MANAGER") log_cmd "aws ssm-quicksetup delete-configuration-manager --manager-arn $resource_id" || true ;; *) echo "Unknown resource type: $resource_type, cannot delete automatically" | tee -a "$LOG_FILE" ;; esac done echo "Cleanup completed" | tee -a "$LOG_FILE" # Clean up temporary files rm -f ssm-onboarding-policy.json trust-policy.json ssm-config.json 2>/dev/null || true } # Main script execution echo "AWS Systems Manager Setup Script" echo "================================" echo "This script will set up AWS Systems Manager for a single account and region." echo "It will create IAM policies and roles, then enable Systems Manager features." echo "" # Get the current AWS region CURRENT_REGION=$(aws configure get region) if [[ -z "$CURRENT_REGION" ]]; then echo "No AWS region configured. Please specify a region:" read -r CURRENT_REGION if [[ -z "$CURRENT_REGION" ]]; then echo "ERROR: A region must be specified" | tee -a "$LOG_FILE" exit 1 fi fi echo "Using AWS region: $CURRENT_REGION" | tee -a "$LOG_FILE" # Step 1: Create IAM policy for Systems Manager onboarding echo "Step 1: Creating IAM policy for Systems Manager onboarding..." # Create policy document cat > ssm-onboarding-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Sid": "QuickSetupActions", "Effect": "Allow", "Action": [ "ssm-quicksetup:*" ], "Resource": "*" }, { "Sid": "SsmReadOnly", "Effect": "Allow", "Action": [ "ssm:DescribeAutomationExecutions", "ssm:GetAutomationExecution", "ssm:ListAssociations", "ssm:DescribeAssociation", "ssm:ListDocuments", "ssm:ListResourceDataSync", "ssm:DescribePatchBaselines", "ssm:GetPatchBaseline", "ssm:DescribeMaintenanceWindows", "ssm:DescribeMaintenanceWindowTasks" ], "Resource": "*" }, { "Sid": "SsmDocument", "Effect": "Allow", "Action": [ "ssm:GetDocument", "ssm:DescribeDocument" ], "Resource": [ "arn:aws:ssm:*:*:document/AWSQuickSetupType-*", "arn:aws:ssm:*:*:document/AWS-EnableExplorer" ] }, { "Sid": "SsmEnableExplorer", "Effect": "Allow", "Action": "ssm:StartAutomationExecution", "Resource": "arn:aws:ssm:*:*:automation-definition/AWS-EnableExplorer:*" }, { "Sid": "SsmExplorerRds", "Effect": "Allow", "Action": [ "ssm:GetOpsSummary", "ssm:CreateResourceDataSync", "ssm:UpdateResourceDataSync" ], "Resource": "arn:aws:ssm:*:*:resource-data-sync/AWS-QuickSetup-*" }, { "Sid": "OrgsReadOnly", "Effect": "Allow", "Action": [ "organizations:DescribeAccount", "organizations:DescribeOrganization", "organizations:ListDelegatedAdministrators", "organizations:ListRoots", "organizations:ListParents", "organizations:ListOrganizationalUnitsForParent", "organizations:DescribeOrganizationalUnit", "organizations:ListAWSServiceAccessForOrganization" ], "Resource": "*" }, { "Sid": "OrgsAdministration", "Effect": "Allow", "Action": [ "organizations:EnableAWSServiceAccess", "organizations:RegisterDelegatedAdministrator", "organizations:DeregisterDelegatedAdministrator" ], "Resource": "*", "Condition": { "StringEquals": { "organizations:ServicePrincipal": [ "ssm.amazonaws.com", "ssm-quicksetup.amazonaws.com", "member.org.stacksets.cloudformation.amazonaws.com", "resource-explorer-2.amazonaws.com" ] } } }, { "Sid": "CfnReadOnly", "Effect": "Allow", "Action": [ "cloudformation:ListStacks", "cloudformation:DescribeStacks", "cloudformation:ListStackSets", "cloudformation:DescribeOrganizationsAccess" ], "Resource": "*" }, { "Sid": "OrgCfnAccess", "Effect": "Allow", "Action": [ "cloudformation:ActivateOrganizationsAccess" ], "Resource": "*" }, { "Sid": "CfnStackActions", "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:DeleteStack", "cloudformation:DescribeStackResources", "cloudformation:DescribeStackEvents", "cloudformation:GetTemplate", "cloudformation:RollbackStack", "cloudformation:TagResource", "cloudformation:UntagResource", "cloudformation:UpdateStack" ], "Resource": [ "arn:aws:cloudformation:*:*:stack/StackSet-AWS-QuickSetup-*", "arn:aws:cloudformation:*:*:stack/AWS-QuickSetup-*", "arn:aws:cloudformation:*:*:type/resource/*" ] }, { "Sid": "CfnStackSetActions", "Effect": "Allow", "Action": [ "cloudformation:CreateStackInstances", "cloudformation:CreateStackSet", "cloudformation:DeleteStackInstances", "cloudformation:DeleteStackSet", "cloudformation:DescribeStackInstance", "cloudformation:DetectStackSetDrift", "cloudformation:ListStackInstanceResourceDrifts", "cloudformation:DescribeStackSet", "cloudformation:DescribeStackSetOperation", "cloudformation:ListStackInstances", "cloudformation:ListStackSetOperations", "cloudformation:ListStackSetOperationResults", "cloudformation:TagResource", "cloudformation:UntagResource", "cloudformation:UpdateStackSet" ], "Resource": [ "arn:aws:cloudformation:*:*:stackset/AWS-QuickSetup-*", "arn:aws:cloudformation:*:*:type/resource/*", "arn:aws:cloudformation:*:*:stackset-target/AWS-QuickSetup-*:*" ] }, { "Sid": "ValidationReadonlyActions", "Effect": "Allow", "Action": [ "iam:ListRoles", "iam:GetRole" ], "Resource": "*" }, { "Sid": "IamRolesMgmt", "Effect": "Allow", "Action": [ "iam:CreateRole", "iam:DeleteRole", "iam:GetRole", "iam:AttachRolePolicy", "iam:DetachRolePolicy", "iam:GetRolePolicy", "iam:ListRolePolicies" ], "Resource": [ "arn:aws:iam::*:role/AWS-QuickSetup-*", "arn:aws:iam::*:role/service-role/AWS-QuickSetup-*" ] }, { "Sid": "IamPassRole", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::*:role/AWS-QuickSetup-*", "arn:aws:iam::*:role/service-role/AWS-QuickSetup-*" ], "Condition": { "StringEquals": { "iam:PassedToService": [ "ssm.amazonaws.com", "ssm-quicksetup.amazonaws.com", "cloudformation.amazonaws.com" ] } } }, { "Sid": "IamRolesPoliciesMgmt", "Effect": "Allow", "Action": [ "iam:AttachRolePolicy", "iam:DetachRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/AWS-QuickSetup-*", "arn:aws:iam::*:role/service-role/AWS-QuickSetup-*" ], "Condition": { "ArnEquals": { "iam:PolicyARN": [ "arn:aws:iam::aws:policy/AWSSystemsManagerEnableExplorerExecutionPolicy", "arn:aws:iam::aws:policy/AWSQuickSetupSSMDeploymentRolePolicy" ] } } }, { "Sid": "CfnStackSetsSLR", "Effect": "Allow", "Action": [ "iam:CreateServiceLinkedRole" ], "Resource": [ "arn:aws:iam::*:role/aws-service-role/stacksets.cloudformation.amazonaws.com/AWSServiceRoleForCloudFormationStackSetsOrgAdmin", "arn:aws:iam::*:role/aws-service-role/ssm.amazonaws.com/AWSServiceRoleForAmazonSSM", "arn:aws:iam::*:role/aws-service-role/accountdiscovery.ssm.amazonaws.com/AWSServiceRoleForAmazonSSM_AccountDiscovery", "arn:aws:iam::*:role/aws-service-role/ssm-quicksetup.amazonaws.com/AWSServiceRoleForSSMQuickSetup", "arn:aws:iam::*:role/aws-service-role/resource-explorer-2.amazonaws.com/AWSServiceRoleForResourceExplorer" ] } ] } EOF # Create the IAM policy POLICY_OUTPUT=$(log_cmd "aws iam create-policy --policy-name SSMOnboardingPolicy --policy-document file://ssm-onboarding-policy.json --output json") POLICY_STATUS=$? check_error "$POLICY_OUTPUT" $POLICY_STATUS "Failed to create IAM policy" # Extract the policy ARN POLICY_ARN=$(echo "$POLICY_OUTPUT" | grep -o 'arn:aws:iam::[0-9]*:policy/SSMOnboardingPolicy') if [[ -z "$POLICY_ARN" ]]; then echo "ERROR: Failed to extract policy ARN" | tee -a "$LOG_FILE" exit 1 fi # Track the created policy track_resource "IAM_POLICY" "$POLICY_ARN" echo "Created policy: $POLICY_ARN" | tee -a "$LOG_FILE" # Step 2: Create and configure IAM role for Systems Manager echo "" echo "Step 2: Creating IAM role for Systems Manager..." # Get current user name USER_OUTPUT=$(log_cmd "aws sts get-caller-identity --output json") USER_STATUS=$? check_error "$USER_OUTPUT" $USER_STATUS "Failed to get caller identity" # Extract account ID ACCOUNT_ID=$(echo "$USER_OUTPUT" | grep -o '"Account": "[0-9]*"' | cut -d'"' -f4) if [[ -z "$ACCOUNT_ID" ]]; then echo "ERROR: Failed to extract account ID" | tee -a "$LOG_FILE" exit 1 fi # Generate a unique role name ROLE_NAME="SSMTutorialRole-$(openssl rand -hex 4)" # Create trust policy for the role - FIXED: Added cloudformation.amazonaws.com cat > trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ssm.amazonaws.com", "ssm-quicksetup.amazonaws.com", "cloudformation.amazonaws.com" ] }, "Action": "sts:AssumeRole" }, { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::ACCOUNT_ID:root" }, "Action": "sts:AssumeRole" } ] } EOF # Replace ACCOUNT_ID placeholder in trust policy sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" trust-policy.json # Create the IAM role ROLE_OUTPUT=$(log_cmd "aws iam create-role --role-name $ROLE_NAME --assume-role-policy-document file://trust-policy.json --description 'Role for Systems Manager tutorial' --output json") ROLE_STATUS=$? check_error "$ROLE_OUTPUT" $ROLE_STATUS "Failed to create IAM role" # Extract the role ARN ROLE_ARN=$(echo "$ROLE_OUTPUT" | grep -o 'arn:aws:iam::[0-9]*:role/[^"]*') if [[ -z "$ROLE_ARN" ]]; then echo "ERROR: Failed to extract role ARN" | tee -a "$LOG_FILE" cleanup_on_error exit 1 fi # Track the created role track_resource "IAM_ROLE" "$ROLE_NAME" echo "Created IAM role: $ROLE_NAME" | tee -a "$LOG_FILE" echo "Role ARN: $ROLE_ARN" | tee -a "$LOG_FILE" # Set identity variables for cleanup IDENTITY_TYPE="role" IDENTITY_NAME="$ROLE_NAME" # Attach the policy to the role ATTACH_OUTPUT=$(log_cmd "aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn $POLICY_ARN") ATTACH_STATUS=$? check_error "$ATTACH_OUTPUT" $ATTACH_STATUS "Failed to attach policy to role $ROLE_NAME" echo "Policy attached to role: $ROLE_NAME" | tee -a "$LOG_FILE" # Step 3: Create Systems Manager configuration using Host Management echo "" echo "Step 3: Creating Systems Manager configuration..." # Generate a random identifier for the configuration name CONFIG_NAME="SSMSetup-$(openssl rand -hex 4)" # Create configuration file for Systems Manager setup using Host Management # Added both required parameters for single account deployment based on CloudFormation documentation cat > ssm-config.json << EOF [ { "Type": "AWSQuickSetupType-SSMHostMgmt", "LocalDeploymentAdministrationRoleArn": "$ROLE_ARN", "LocalDeploymentExecutionRoleName": "$ROLE_NAME", "Parameters": { "TargetAccounts": "$ACCOUNT_ID", "TargetRegions": "$CURRENT_REGION" } } ] EOF echo "Configuration file created:" | tee -a "$LOG_FILE" cat ssm-config.json | tee -a "$LOG_FILE" # Create the configuration manager CONFIG_OUTPUT=$(log_cmd "aws ssm-quicksetup create-configuration-manager --name \"$CONFIG_NAME\" --configuration-definitions file://ssm-config.json --region $CURRENT_REGION") CONFIG_STATUS=$? check_error "$CONFIG_OUTPUT" $CONFIG_STATUS "Failed to create Systems Manager configuration" # Extract the manager ARN MANAGER_ARN=$(echo "$CONFIG_OUTPUT" | grep -o 'arn:aws:ssm-quicksetup:[^"]*') if [[ -z "$MANAGER_ARN" ]]; then echo "ERROR: Failed to extract manager ARN" | tee -a "$LOG_FILE" exit 1 fi # Track the created configuration manager track_resource "SSM_CONFIG_MANAGER" "$MANAGER_ARN" echo "Created Systems Manager configuration: $MANAGER_ARN" | tee -a "$LOG_FILE" # Step 4: Verify the setup echo "" echo "Step 4: Verifying the setup..." # Wait for the configuration to be fully deployed echo "Waiting for the configuration to be deployed (this may take a few minutes)..." sleep 30 # Check the configuration manager status VERIFY_OUTPUT=$(log_cmd "aws ssm-quicksetup get-configuration-manager --manager-arn $MANAGER_ARN --region $CURRENT_REGION") VERIFY_STATUS=$? check_error "$VERIFY_OUTPUT" $VERIFY_STATUS "Failed to verify configuration manager" echo "Systems Manager setup completed successfully!" | tee -a "$LOG_FILE" # List the created resources echo "" echo "===========================================" echo "CREATED RESOURCES" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "$resource" done # Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Cleaning up resources..." | tee -a "$LOG_FILE" cleanup_resources echo "Cleanup completed." | tee -a "$LOG_FILE" else echo "Resources will not be cleaned up. You can manually clean them up later." | tee -a "$LOG_FILE" fi echo "" echo "Script execution completed. See $LOG_FILE for details." # Clean up temporary files rm -f ssm-onboarding-policy.json trust-policy.json ssm-config.json 2>/dev/null || true