There are more AWS SDK examples available in the AWS Doc SDK Examples
CloudWatch examples using AWS CLI with Bash script
The following code examples show you how to perform actions and implement common scenarios by using the AWS Command Line Interface with Bash script with CloudWatch.
Scenarios are code examples that show you how to accomplish specific tasks by calling multiple functions within a service or combined with other AWS services.
Each example includes a link to the complete source code, where you can find instructions on how to set up and run the code in context.
Topics
Scenarios
The following code example shows how to:
Create a CloudWatch dashboard
Add Lambda metrics widgets with a function name variable
Verify the dashboard
Clean up resources
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # Script to create a CloudWatch dashboard with Lambda function name as a variable # This script creates a CloudWatch dashboard that allows you to switch between different Lambda functions # Set up logging LOG_FILE="cloudwatch-dashboard-script.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "$(date): Starting CloudWatch dashboard creation script" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created:" echo "- CloudWatch Dashboard: LambdaMetricsDashboard" echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "An error occurred. Do you want to clean up the created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then echo "Cleaning up resources..." aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard echo "Cleanup complete." else echo "Resources were not cleaned up. You can manually delete them later." fi exit 1 } # Check if AWS CLI is installed and configured echo "Checking AWS CLI configuration..." aws sts get-caller-identity > /dev/null 2>&1 if [ $? -ne 0 ]; then handle_error "AWS CLI is not properly configured. Please configure it with 'aws configure' and try again." fi # Get the current region REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" echo "No region found in AWS config, defaulting to $REGION" fi echo "Using region: $REGION" # Check if there are any Lambda functions in the account echo "Checking for Lambda functions..." LAMBDA_FUNCTIONS=$(aws lambda list-functions --query "Functions[*].FunctionName" --output text) if [ -z "$LAMBDA_FUNCTIONS" ]; then echo "No Lambda functions found in your account. Creating a simple test function..." # Create a temporary directory for Lambda function code TEMP_DIR=$(mktemp -d) # Create a simple Lambda function cat > "$TEMP_DIR/index.js" << EOF exports.handler = async (event) => { console.log('Event:', JSON.stringify(event, null, 2)); return { statusCode: 200, body: JSON.stringify('Hello from Lambda!'), }; }; EOF # Zip the function code cd "$TEMP_DIR" || handle_error "Failed to change to temporary directory" zip -q function.zip index.js # Create a role for the Lambda function ROLE_NAME="LambdaDashboardTestRole" ROLE_ARN=$(aws iam create-role \ --role-name "$ROLE_NAME" \ --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' \ --query "Role.Arn" \ --output text) if [ $? -ne 0 ]; then handle_error "Failed to create IAM role for Lambda function" fi echo "Waiting for role to be available..." sleep 10 # Attach basic Lambda execution policy aws iam attach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" if [ $? -ne 0 ]; then aws iam delete-role --role-name "$ROLE_NAME" handle_error "Failed to attach policy to IAM role" fi # Create the Lambda function FUNCTION_NAME="DashboardTestFunction" aws lambda create-function \ --function-name "$FUNCTION_NAME" \ --runtime nodejs18.x \ --role "$ROLE_ARN" \ --handler index.handler \ --zip-file fileb://function.zip if [ $? -ne 0 ]; then aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" handle_error "Failed to create Lambda function" fi # Invoke the function to generate some metrics echo "Invoking Lambda function to generate metrics..." for i in {1..5}; do aws lambda invoke --function-name "$FUNCTION_NAME" --payload '{}' /dev/null > /dev/null sleep 1 done # Clean up temporary directory cd - > /dev/null rm -rf "$TEMP_DIR" # Set the function name for the dashboard DEFAULT_FUNCTION="$FUNCTION_NAME" else # Use the first Lambda function as default DEFAULT_FUNCTION=$(echo "$LAMBDA_FUNCTIONS" | awk '{print $1}') echo "Found Lambda functions. Using $DEFAULT_FUNCTION as default." fi # Create a dashboard with Lambda metrics and a function name variable echo "Creating CloudWatch dashboard with Lambda function name variable..." # Create a JSON file for the dashboard body cat > dashboard-body.json << EOF { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "Invocations", "FunctionName", "\${FunctionName}" ], [ ".", "Errors", ".", "." ], [ ".", "Throttles", ".", "." ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Lambda Function Metrics for \${FunctionName}", "period": 300 } }, { "type": "metric", "x": 0, "y": 6, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "Duration", "FunctionName", "\${FunctionName}", { "stat": "Average" } ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Duration for \${FunctionName}", "period": 300 } }, { "type": "metric", "x": 12, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "ConcurrentExecutions", "FunctionName", "\${FunctionName}" ] ], "view": "timeSeries", "stacked": false, "region": "$REGION", "title": "Concurrent Executions for \${FunctionName}", "period": 300 } } ], "periodOverride": "auto", "variables": [ { "type": "property", "id": "FunctionName", "property": "FunctionName", "label": "Lambda Function", "inputType": "select", "values": [ { "value": "$DEFAULT_FUNCTION", "label": "$DEFAULT_FUNCTION" } ] } ] } EOF # Create the dashboard using the JSON file DASHBOARD_RESULT=$(aws cloudwatch put-dashboard --dashboard-name LambdaMetricsDashboard --dashboard-body file://dashboard-body.json) DASHBOARD_EXIT_CODE=$? # Check if there was a fatal error if [ $DASHBOARD_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to create CloudWatch dashboard." fi # Display any validation messages but continue if [[ "$DASHBOARD_RESULT" == *"DashboardValidationMessages"* ]]; then echo "Dashboard created with validation messages:" echo "$DASHBOARD_RESULT" echo "These validation messages are warnings and the dashboard should still function." else echo "Dashboard created successfully!" fi # Verify the dashboard was created echo "Verifying dashboard creation..." DASHBOARD_INFO=$(aws cloudwatch get-dashboard --dashboard-name LambdaMetricsDashboard) DASHBOARD_INFO_EXIT_CODE=$? if [ $DASHBOARD_INFO_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to verify dashboard creation." fi echo "Dashboard verification successful!" echo "Dashboard details:" echo "$DASHBOARD_INFO" # List all dashboards to confirm echo "Listing all dashboards:" DASHBOARDS=$(aws cloudwatch list-dashboards) DASHBOARDS_EXIT_CODE=$? if [ $DASHBOARDS_EXIT_CODE -ne 0 ]; then # If we created resources, clean them up if [ -n "${FUNCTION_NAME:-}" ]; then aws lambda delete-function --function-name "$FUNCTION_NAME" aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" aws iam delete-role --role-name "$ROLE_NAME" fi handle_error "Failed to list dashboards." fi echo "$DASHBOARDS" # Show instructions for accessing the dashboard echo "" echo "Dashboard created successfully! To access it:" echo "1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/" echo "2. In the navigation pane, choose Dashboards" echo "3. Select LambdaMetricsDashboard" echo "4. You should see a dropdown menu labeled 'Lambda Function' at the top of the dashboard" echo "5. Use this dropdown to select different Lambda functions and see their metrics" echo "" # Create a list of resources for cleanup RESOURCES=("- CloudWatch Dashboard: LambdaMetricsDashboard") if [ -n "${FUNCTION_NAME:-}" ]; then RESOURCES+=("- Lambda Function: $FUNCTION_NAME") RESOURCES+=("- IAM Role: $ROLE_NAME") fi # Prompt for cleanup echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Resources created:" for resource in "${RESOURCES[@]}"; do echo "$resource" done echo "" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "${CLEANUP_CHOICE,,}" == "y" ]]; then echo "Cleaning up resources..." # Delete the dashboard aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard if [ $? -ne 0 ]; then echo "WARNING: Failed to delete dashboard. You may need to delete it manually." else echo "Dashboard deleted successfully." fi # If we created a Lambda function, delete it and its role if [ -n "${FUNCTION_NAME:-}" ]; then echo "Deleting Lambda function..." aws lambda delete-function --function-name "$FUNCTION_NAME" if [ $? -ne 0 ]; then echo "WARNING: Failed to delete Lambda function. You may need to delete it manually." else echo "Lambda function deleted successfully." fi echo "Detaching role policy..." aws iam detach-role-policy \ --role-name "$ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" if [ $? -ne 0 ]; then echo "WARNING: Failed to detach role policy. You may need to detach it manually." else echo "Role policy detached successfully." fi echo "Deleting IAM role..." aws iam delete-role --role-name "$ROLE_NAME" if [ $? -ne 0 ]; then echo "WARNING: Failed to delete IAM role. You may need to delete it manually." else echo "IAM role deleted successfully." fi fi # Clean up the JSON file rm -f dashboard-body.json echo "Cleanup complete." else echo "Resources were not cleaned up. You can manually delete them later with:" echo "aws cloudwatch delete-dashboards --dashboard-names LambdaMetricsDashboard" if [ -n "${FUNCTION_NAME:-}" ]; then echo "aws lambda delete-function --function-name $FUNCTION_NAME" echo "aws iam detach-role-policy --role-name $ROLE_NAME --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" echo "aws iam delete-role --role-name $ROLE_NAME" fi fi echo "Script completed successfully!"-
For API details, see the following topics in AWS CLI Command Reference.
-
The following code example shows how to:
Create IAM roles
Create a CloudWatch alarm
Create an experiment template
Run the experiment
Verify the results
Clean up resources
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # AWS FIS CPU Stress Test Tutorial Script # This script automates the steps in the AWS FIS CPU stress test tutorial # approach using epoch time calculations that work across all Linux distributions # Set up logging LOG_FILE="fis-tutorial-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS FIS CPU Stress Test Tutorial Script" echo "Logging to $LOG_FILE" echo "==============================================" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 if echo "$output" | grep -i "error" > /dev/null; then # Ignore specific expected errors if [[ "$cmd" == *"aws fis get-experiment"* ]] && [[ "$output" == *"ConfigurationFailure"* ]]; then echo "Note: Experiment failed due to configuration issue. This is expected in some cases." return 0 fi echo "ERROR: Command failed: $cmd" echo "Output: $output" cleanup_on_error exit 1 fi } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Cleaning up resources..." if [ -n "$EXPERIMENT_ID" ]; then echo "Stopping experiment $EXPERIMENT_ID if running..." aws fis stop-experiment --id "$EXPERIMENT_ID" 2>/dev/null || true fi if [ -n "$TEMPLATE_ID" ]; then echo "Deleting experiment template $TEMPLATE_ID..." aws fis delete-experiment-template --id "$TEMPLATE_ID" || true fi if [ -n "$INSTANCE_ID" ]; then echo "Terminating EC2 instance $INSTANCE_ID..." aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" || true fi if [ -n "$ALARM_NAME" ]; then echo "Deleting CloudWatch alarm $ALARM_NAME..." aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME" || true fi if [ -n "$INSTANCE_PROFILE_NAME" ]; then echo "Removing role from instance profile..." aws iam remove-role-from-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" --role-name "$EC2_ROLE_NAME" || true echo "Deleting instance profile..." aws iam delete-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" || true fi if [ -n "$FIS_ROLE_NAME" ]; then echo "Deleting FIS role policy..." aws iam delete-role-policy --role-name "$FIS_ROLE_NAME" --policy-name "$FIS_POLICY_NAME" || true echo "Deleting FIS role..." aws iam delete-role --role-name "$FIS_ROLE_NAME" || true fi if [ -n "$EC2_ROLE_NAME" ]; then echo "Detaching policy from EC2 role..." aws iam detach-role-policy --role-name "$EC2_ROLE_NAME" --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" || true echo "Deleting EC2 role..." aws iam delete-role --role-name "$EC2_ROLE_NAME" || true fi echo "Cleanup completed." } # Generate unique identifiers for resources TIMESTAMP=$(date +%Y%m%d%H%M%S) FIS_ROLE_NAME="FISRole-${TIMESTAMP}" FIS_POLICY_NAME="FISPolicy-${TIMESTAMP}" EC2_ROLE_NAME="EC2SSMRole-${TIMESTAMP}" INSTANCE_PROFILE_NAME="EC2SSMProfile-${TIMESTAMP}" ALARM_NAME="FIS-CPU-Alarm-${TIMESTAMP}" # Track created resources CREATED_RESOURCES=() echo "Step 1: Creating IAM role for AWS FIS" # Create trust policy file for AWS FIS cat > fis-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "fis.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for FIS echo "Creating IAM role $FIS_ROLE_NAME for AWS FIS..." FIS_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$FIS_ROLE_NAME" \ --assume-role-policy-document file://fis-trust-policy.json) check_error "$FIS_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $FIS_ROLE_NAME") # Create policy document for SSM actions cat > fis-ssm-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:SendCommand", "ssm:ListCommands", "ssm:ListCommandInvocations" ], "Resource": "*" } ] } EOF # Attach policy to the role echo "Attaching policy $FIS_POLICY_NAME to role $FIS_ROLE_NAME..." FIS_POLICY_OUTPUT=$(aws iam put-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME" \ --policy-document file://fis-ssm-policy.json) check_error "$FIS_POLICY_OUTPUT" "aws iam put-role-policy" CREATED_RESOURCES+=("IAM Policy: $FIS_POLICY_NAME attached to $FIS_ROLE_NAME") echo "Step 2: Creating IAM role for EC2 instance with SSM permissions" # Create trust policy file for EC2 cat > ec2-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for EC2 echo "Creating IAM role $EC2_ROLE_NAME for EC2 instance..." EC2_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$EC2_ROLE_NAME" \ --assume-role-policy-document file://ec2-trust-policy.json) check_error "$EC2_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $EC2_ROLE_NAME") # Attach SSM policy to the EC2 role echo "Attaching AmazonSSMManagedInstanceCore policy to role $EC2_ROLE_NAME..." EC2_POLICY_OUTPUT=$(aws iam attach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore) check_error "$EC2_POLICY_OUTPUT" "aws iam attach-role-policy" CREATED_RESOURCES+=("IAM Policy: AmazonSSMManagedInstanceCore attached to $EC2_ROLE_NAME") # Create instance profile echo "Creating instance profile $INSTANCE_PROFILE_NAME..." PROFILE_OUTPUT=$(aws iam create-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$PROFILE_OUTPUT" "aws iam create-instance-profile" CREATED_RESOURCES+=("IAM Instance Profile: $INSTANCE_PROFILE_NAME") # Add role to instance profile echo "Adding role $EC2_ROLE_NAME to instance profile $INSTANCE_PROFILE_NAME..." ADD_ROLE_OUTPUT=$(aws iam add-role-to-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$ADD_ROLE_OUTPUT" "aws iam add-role-to-instance-profile" # Wait for role to propagate echo "Waiting for IAM role to propagate..." sleep 10 echo "Step 3: Launching EC2 instance" # Get the latest Amazon Linux 2 AMI ID echo "Finding latest Amazon Linux 2 AMI..." AMI_ID=$(aws ec2 describe-images \ --owners amazon \ --filters "Name=name,Values=amzn2-ami-hvm-*-x86_64-gp2" "Name=state,Values=available" \ --query "sort_by(Images, &CreationDate)[-1].ImageId" \ --output text) check_error "$AMI_ID" "aws ec2 describe-images" echo "Using AMI: $AMI_ID" # Launch EC2 instance echo "Launching EC2 instance with AMI $AMI_ID..." INSTANCE_OUTPUT=$(aws ec2 run-instances \ --image-id "$AMI_ID" \ --instance-type t2.micro \ --iam-instance-profile Name="$INSTANCE_PROFILE_NAME" \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=FIS-Test-Instance}]') check_error "$INSTANCE_OUTPUT" "aws ec2 run-instances" # Get instance ID INSTANCE_ID=$(echo "$INSTANCE_OUTPUT" | grep -i "InstanceId" | head -1 | awk -F'"' '{print $4}') if [ -z "$INSTANCE_ID" ]; then echo "Failed to get instance ID" cleanup_on_error exit 1 fi echo "Launched instance: $INSTANCE_ID" CREATED_RESOURCES+=("EC2 Instance: $INSTANCE_ID") # Enable detailed monitoring echo "Enabling detailed monitoring for instance $INSTANCE_ID..." MONITOR_OUTPUT=$(aws ec2 monitor-instances --instance-ids "$INSTANCE_ID") check_error "$MONITOR_OUTPUT" "aws ec2 monitor-instances" # Wait for instance to be running and status checks to pass echo "Waiting for instance to be ready..." aws ec2 wait instance-running --instance-ids "$INSTANCE_ID" aws ec2 wait instance-status-ok --instance-ids "$INSTANCE_ID" echo "Instance is ready" echo "Step 4: Creating CloudWatch alarm for CPU utilization" # Create CloudWatch alarm echo "Creating CloudWatch alarm $ALARM_NAME..." ALARM_OUTPUT=$(aws cloudwatch put-metric-alarm \ --alarm-name "$ALARM_NAME" \ --alarm-description "Alarm when CPU exceeds 50%" \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Maximum \ --period 60 \ --threshold 50 \ --comparison-operator GreaterThanOrEqualToThreshold \ --dimensions "Name=InstanceId,Value=$INSTANCE_ID" \ --evaluation-periods 1) check_error "$ALARM_OUTPUT" "aws cloudwatch put-metric-alarm" CREATED_RESOURCES+=("CloudWatch Alarm: $ALARM_NAME") # Get the alarm ARN echo "Getting CloudWatch alarm ARN..." ALARM_ARN_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") check_error "$ALARM_ARN_OUTPUT" "aws cloudwatch describe-alarms" ALARM_ARN=$(echo "$ALARM_ARN_OUTPUT" | grep -i "AlarmArn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ALARM_ARN" ]; then echo "Failed to get alarm ARN" cleanup_on_error exit 1 fi echo "Alarm ARN: $ALARM_ARN" # Wait for the alarm to initialize and reach OK state echo "Waiting for CloudWatch alarm to initialize (60 seconds)..." sleep 60 # Check alarm state echo "Checking alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Current alarm state: $ALARM_STATE" # If alarm is not in OK state, wait longer or generate some baseline metrics if [ "$ALARM_STATE" != "OK" ]; then echo "Alarm not in OK state. Waiting for alarm to stabilize (additional 60 seconds)..." sleep 60 # Check alarm state again ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Updated alarm state: $ALARM_STATE" if [ "$ALARM_STATE" != "OK" ]; then echo "Warning: Alarm still not in OK state. Experiment may fail to start." fi fi echo "Step 5: Creating AWS FIS experiment template" # Get the IAM role ARN echo "Getting IAM role ARN for $FIS_ROLE_NAME..." ROLE_ARN_OUTPUT=$(aws iam get-role \ --role-name "$FIS_ROLE_NAME") check_error "$ROLE_ARN_OUTPUT" "aws iam get-role" ROLE_ARN=$(echo "$ROLE_ARN_OUTPUT" | grep -i "Arn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ROLE_ARN" ]; then echo "Failed to get role ARN" cleanup_on_error exit 1 fi echo "Role ARN: $ROLE_ARN" # Get account ID and region ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" # Default to us-east-1 if region not set fi INSTANCE_ARN="arn:aws:ec2:${REGION}:${ACCOUNT_ID}:instance/${INSTANCE_ID}" echo "Instance ARN: $INSTANCE_ARN" # Create experiment template - Fixed JSON escaping issue cat > experiment-template.json << EOF { "description": "Test CPU stress predefined SSM document", "targets": { "testInstance": { "resourceType": "aws:ec2:instance", "resourceArns": ["$INSTANCE_ARN"], "selectionMode": "ALL" } }, "actions": { "runCpuStress": { "actionId": "aws:ssm:send-command", "parameters": { "documentArn": "arn:aws:ssm:$REGION::document/AWSFIS-Run-CPU-Stress", "documentParameters": "{\"DurationSeconds\":\"120\"}", "duration": "PT5M" }, "targets": { "Instances": "testInstance" } } }, "stopConditions": [ { "source": "aws:cloudwatch:alarm", "value": "$ALARM_ARN" } ], "roleArn": "$ROLE_ARN", "tags": { "Name": "FIS-CPU-Stress-Experiment" } } EOF # Create experiment template echo "Creating AWS FIS experiment template..." TEMPLATE_OUTPUT=$(aws fis create-experiment-template --cli-input-json file://experiment-template.json) check_error "$TEMPLATE_OUTPUT" "aws fis create-experiment-template" TEMPLATE_ID=$(echo "$TEMPLATE_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$TEMPLATE_ID" ]; then echo "Failed to get template ID" cleanup_on_error exit 1 fi echo "Experiment template created with ID: $TEMPLATE_ID" CREATED_RESOURCES+=("FIS Experiment Template: $TEMPLATE_ID") echo "Step 6: Starting the experiment" # Start the experiment echo "Starting AWS FIS experiment using template $TEMPLATE_ID..." EXPERIMENT_OUTPUT=$(aws fis start-experiment \ --experiment-template-id "$TEMPLATE_ID" \ --tags '{"Name": "FIS-CPU-Stress-Run"}') check_error "$EXPERIMENT_OUTPUT" "aws fis start-experiment" EXPERIMENT_ID=$(echo "$EXPERIMENT_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$EXPERIMENT_ID" ]; then echo "Failed to get experiment ID" cleanup_on_error exit 1 fi echo "Experiment started with ID: $EXPERIMENT_ID" CREATED_RESOURCES+=("FIS Experiment: $EXPERIMENT_ID") echo "Step 7: Tracking experiment progress" # Track experiment progress echo "Tracking experiment progress..." MAX_CHECKS=30 CHECK_COUNT=0 EXPERIMENT_STATE="" while [ $CHECK_COUNT -lt $MAX_CHECKS ]; do EXPERIMENT_INFO=$(aws fis get-experiment --id "$EXPERIMENT_ID") # Don't check for errors here, as we expect some experiments to fail EXPERIMENT_STATE=$(echo "$EXPERIMENT_INFO" | grep -i "status" | head -1 | awk -F'"' '{print $4}') echo "Experiment state: $EXPERIMENT_STATE" if [ "$EXPERIMENT_STATE" == "completed" ] || [ "$EXPERIMENT_STATE" == "stopped" ] || [ "$EXPERIMENT_STATE" == "failed" ]; then # Show the reason for the state REASON=$(echo "$EXPERIMENT_INFO" | grep -i "reason" | head -1 | awk -F'"' '{print $4}') if [ -n "$REASON" ]; then echo "Reason: $REASON" fi break fi echo "Waiting 10 seconds before checking again..." sleep 10 CHECK_COUNT=$((CHECK_COUNT + 1)) done if [ $CHECK_COUNT -eq $MAX_CHECKS ]; then echo "Experiment is taking longer than expected. You can check its status later using:" echo "aws fis get-experiment --id $EXPERIMENT_ID" fi echo "Step 8: Verifying experiment results" # Check CloudWatch alarm state echo "Checking CloudWatch alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms --alarm-names "$ALARM_NAME") check_error "$ALARM_STATE_OUTPUT" "aws cloudwatch describe-alarms" echo "$ALARM_STATE_OUTPUT" # Get CPU utilization metrics echo "Getting CPU utilization metrics..." END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ") # FIXED: Cross-platform compatible way to calculate time 10 minutes ago # This approach uses epoch seconds and basic arithmetic which works on all Linux distributions CURRENT_EPOCH=$(date +%s) TEN_MINUTES_AGO_EPOCH=$((CURRENT_EPOCH - 600)) START_TIME=$(date -u -d "@$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u -r "$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ") # Create metric query file cat > metric-query.json << EOF [ { "Id": "cpu", "MetricStat": { "Metric": { "Namespace": "AWS/EC2", "MetricName": "CPUUtilization", "Dimensions": [ { "Name": "InstanceId", "Value": "$INSTANCE_ID" } ] }, "Period": 60, "Stat": "Maximum" } } ] EOF METRICS_OUTPUT=$(aws cloudwatch get-metric-data \ --start-time "$START_TIME" \ --end-time "$END_TIME" \ --metric-data-queries file://metric-query.json) check_error "$METRICS_OUTPUT" "aws cloudwatch get-metric-data" echo "CPU Utilization Metrics:" echo "$METRICS_OUTPUT" # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "- $resource" done echo "===========================================" # Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup process..." # Stop experiment if still running if [ "$EXPERIMENT_STATE" != "completed" ] && [ "$EXPERIMENT_STATE" != "stopped" ] && [ "$EXPERIMENT_STATE" != "failed" ]; then echo "Stopping experiment $EXPERIMENT_ID..." STOP_OUTPUT=$(aws fis stop-experiment --id "$EXPERIMENT_ID") check_error "$STOP_OUTPUT" "aws fis stop-experiment" echo "Waiting for experiment to stop..." sleep 10 fi # Delete experiment template echo "Deleting experiment template $TEMPLATE_ID..." DELETE_TEMPLATE_OUTPUT=$(aws fis delete-experiment-template --id "$TEMPLATE_ID") check_error "$DELETE_TEMPLATE_OUTPUT" "aws fis delete-experiment-template" # Delete CloudWatch alarm echo "Deleting CloudWatch alarm $ALARM_NAME..." DELETE_ALARM_OUTPUT=$(aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME") check_error "$DELETE_ALARM_OUTPUT" "aws cloudwatch delete-alarms" # Terminate EC2 instance echo "Terminating EC2 instance $INSTANCE_ID..." TERMINATE_OUTPUT=$(aws ec2 terminate-instances --instance-ids "$INSTANCE_ID") check_error "$TERMINATE_OUTPUT" "aws ec2 terminate-instances" echo "Waiting for instance to terminate..." aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID" # Clean up IAM resources echo "Removing role from instance profile..." REMOVE_ROLE_OUTPUT=$(aws iam remove-role-from-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$REMOVE_ROLE_OUTPUT" "aws iam remove-role-from-instance-profile" echo "Deleting instance profile..." DELETE_PROFILE_OUTPUT=$(aws iam delete-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$DELETE_PROFILE_OUTPUT" "aws iam delete-instance-profile" echo "Deleting FIS role policy..." DELETE_POLICY_OUTPUT=$(aws iam delete-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME") check_error "$DELETE_POLICY_OUTPUT" "aws iam delete-role-policy" echo "Detaching policy from EC2 role..." DETACH_POLICY_OUTPUT=$(aws iam detach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore") check_error "$DETACH_POLICY_OUTPUT" "aws iam detach-role-policy" echo "Deleting FIS role..." DELETE_FIS_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$FIS_ROLE_NAME") check_error "$DELETE_FIS_ROLE_OUTPUT" "aws iam delete-role" echo "Deleting EC2 role..." DELETE_EC2_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$EC2_ROLE_NAME") check_error "$DELETE_EC2_ROLE_OUTPUT" "aws iam delete-role" # Clean up temporary files echo "Cleaning up temporary files..." rm -f fis-trust-policy.json ec2-trust-policy.json fis-ssm-policy.json experiment-template.json metric-query.json echo "Cleanup completed successfully." else echo "Cleanup skipped. Resources will remain in your AWS account." echo "You can manually clean up the resources listed above." fi echo "" echo "Script execution completed." echo "Log file: $LOG_FILE"-
For API details, see the following topics in AWS CLI Command Reference.
-
The following code example shows how to:
Create Lambda functions for monitoring
Create a CloudWatch dashboard
Add a property variable to the dashboard
Clean up resources
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # CloudWatch Dashboard with Lambda Function Variable Script # This script creates a CloudWatch dashboard with a property variable for Lambda function names # Set up logging LOG_FILE="cloudwatch-dashboard-script-v4.log" echo "Starting script execution at $(date)" > "$LOG_FILE" # Function to log commands and their output log_cmd() { echo "$(date): Running command: $1" >> "$LOG_FILE" eval "$1" 2>&1 | tee -a "$LOG_FILE" return ${PIPESTATUS[0]} } # Function to check for errors in command output check_error() { local cmd_output="$1" local cmd_status="$2" local error_msg="$3" if [ $cmd_status -ne 0 ] || echo "$cmd_output" | grep -i "error" > /dev/null; then echo "ERROR: $error_msg" | tee -a "$LOG_FILE" echo "Command output: $cmd_output" | tee -a "$LOG_FILE" cleanup_resources exit 1 fi } # Function to clean up resources cleanup_resources() { echo "" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "CLEANUP PROCESS" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" if [ -n "$DASHBOARD_NAME" ]; then echo "Deleting CloudWatch dashboard: $DASHBOARD_NAME" | tee -a "$LOG_FILE" log_cmd "aws cloudwatch delete-dashboards --dashboard-names \"$DASHBOARD_NAME\"" fi if [ -n "$LAMBDA_FUNCTION1" ]; then echo "Deleting Lambda function: $LAMBDA_FUNCTION1" | tee -a "$LOG_FILE" log_cmd "aws lambda delete-function --function-name \"$LAMBDA_FUNCTION1\"" fi if [ -n "$LAMBDA_FUNCTION2" ]; then echo "Deleting Lambda function: $LAMBDA_FUNCTION2" | tee -a "$LOG_FILE" log_cmd "aws lambda delete-function --function-name \"$LAMBDA_FUNCTION2\"" fi if [ -n "$ROLE_NAME" ]; then echo "Detaching policy from role: $ROLE_NAME" | tee -a "$LOG_FILE" log_cmd "aws iam detach-role-policy --role-name \"$ROLE_NAME\" --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" echo "Deleting IAM role: $ROLE_NAME" | tee -a "$LOG_FILE" log_cmd "aws iam delete-role --role-name \"$ROLE_NAME\"" fi echo "Cleanup completed." | tee -a "$LOG_FILE" } # Function to prompt for cleanup confirmation confirm_cleanup() { echo "" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "CLEANUP CONFIRMATION" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "The following resources were created:" | tee -a "$LOG_FILE" echo "- CloudWatch Dashboard: $DASHBOARD_NAME" | tee -a "$LOG_FILE" if [ -n "$LAMBDA_FUNCTION1" ]; then echo "- Lambda Function: $LAMBDA_FUNCTION1" | tee -a "$LOG_FILE" fi if [ -n "$LAMBDA_FUNCTION2" ]; then echo "- Lambda Function: $LAMBDA_FUNCTION2" | tee -a "$LOG_FILE" fi if [ -n "$ROLE_NAME" ]; then echo "- IAM Role: $ROLE_NAME" | tee -a "$LOG_FILE" fi echo "" | tee -a "$LOG_FILE" echo "Do you want to clean up all created resources? (y/n): " | tee -a "$LOG_FILE" read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then cleanup_resources else echo "Resources were not cleaned up. You can manually delete them later." | tee -a "$LOG_FILE" fi } # Get AWS region AWS_REGION=$(aws configure get region) if [ -z "$AWS_REGION" ]; then AWS_REGION="us-east-1" echo "No region found in AWS config, defaulting to $AWS_REGION" | tee -a "$LOG_FILE" else echo "Using AWS region: $AWS_REGION" | tee -a "$LOG_FILE" fi # Generate unique identifiers RANDOM_ID=$(openssl rand -hex 6) DASHBOARD_NAME="LambdaMetricsDashboard-${RANDOM_ID}" LAMBDA_FUNCTION1="TestFunction1-${RANDOM_ID}" LAMBDA_FUNCTION2="TestFunction2-${RANDOM_ID}" ROLE_NAME="LambdaExecutionRole-${RANDOM_ID}" echo "Using random identifier: $RANDOM_ID" | tee -a "$LOG_FILE" echo "Dashboard name: $DASHBOARD_NAME" | tee -a "$LOG_FILE" echo "Lambda function names: $LAMBDA_FUNCTION1, $LAMBDA_FUNCTION2" | tee -a "$LOG_FILE" echo "IAM role name: $ROLE_NAME" | tee -a "$LOG_FILE" # Create IAM role for Lambda functions echo "Creating IAM role for Lambda..." | tee -a "$LOG_FILE" TRUST_POLICY='{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' echo "$TRUST_POLICY" > trust-policy.json ROLE_OUTPUT=$(log_cmd "aws iam create-role --role-name \"$ROLE_NAME\" --assume-role-policy-document file://trust-policy.json --output json") check_error "$ROLE_OUTPUT" $? "Failed to create IAM role" ROLE_ARN=$(echo "$ROLE_OUTPUT" | grep -o '"Arn": "[^"]*' | cut -d'"' -f4) echo "Role ARN: $ROLE_ARN" | tee -a "$LOG_FILE" # Attach Lambda basic execution policy to the role echo "Attaching Lambda execution policy to role..." | tee -a "$LOG_FILE" POLICY_OUTPUT=$(log_cmd "aws iam attach-role-policy --role-name \"$ROLE_NAME\" --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole") check_error "$POLICY_OUTPUT" $? "Failed to attach policy to role" # Wait for role to propagate echo "Waiting for IAM role to propagate..." | tee -a "$LOG_FILE" sleep 10 # Create simple Python Lambda function code echo "Creating Lambda function code..." | tee -a "$LOG_FILE" cat > lambda_function.py << 'EOF' def handler(event, context): print("Lambda function executed successfully") return { 'statusCode': 200, 'body': 'Success' } EOF # Zip the Lambda function code log_cmd "zip -j lambda_function.zip lambda_function.py" # Create first Lambda function echo "Creating first Lambda function: $LAMBDA_FUNCTION1..." | tee -a "$LOG_FILE" LAMBDA1_OUTPUT=$(log_cmd "aws lambda create-function --function-name \"$LAMBDA_FUNCTION1\" --runtime python3.9 --role \"$ROLE_ARN\" --handler lambda_function.handler --zip-file fileb://lambda_function.zip") check_error "$LAMBDA1_OUTPUT" $? "Failed to create first Lambda function" # Create second Lambda function echo "Creating second Lambda function: $LAMBDA_FUNCTION2..." | tee -a "$LOG_FILE" LAMBDA2_OUTPUT=$(log_cmd "aws lambda create-function --function-name \"$LAMBDA_FUNCTION2\" --runtime python3.9 --role \"$ROLE_ARN\" --handler lambda_function.handler --zip-file fileb://lambda_function.zip") check_error "$LAMBDA2_OUTPUT" $? "Failed to create second Lambda function" # Invoke Lambda functions to generate some metrics echo "Invoking Lambda functions to generate metrics..." | tee -a "$LOG_FILE" log_cmd "aws lambda invoke --function-name \"$LAMBDA_FUNCTION1\" --payload '{}' /dev/null" log_cmd "aws lambda invoke --function-name \"$LAMBDA_FUNCTION2\" --payload '{}' /dev/null" # Create CloudWatch dashboard with property variable echo "Creating CloudWatch dashboard with property variable..." | tee -a "$LOG_FILE" # Create a simpler dashboard with a property variable # This approach uses a more basic dashboard structure that's known to work with the CloudWatch API DASHBOARD_BODY=$(cat <<EOF { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/Lambda", "Invocations", "FunctionName", "$LAMBDA_FUNCTION1" ] ], "view": "timeSeries", "stacked": false, "region": "$AWS_REGION", "title": "Lambda Invocations", "period": 300, "stat": "Sum" } } ] } EOF ) # First create a basic dashboard without variables echo "Creating initial dashboard without variables..." | tee -a "$LOG_FILE" DASHBOARD_OUTPUT=$(log_cmd "aws cloudwatch put-dashboard --dashboard-name \"$DASHBOARD_NAME\" --dashboard-body '$DASHBOARD_BODY'") check_error "$DASHBOARD_OUTPUT" $? "Failed to create initial CloudWatch dashboard" # Now let's try to add a property variable using the console instructions echo "To complete the tutorial, please follow these steps in the CloudWatch console:" | tee -a "$LOG_FILE" echo "1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/" | tee -a "$LOG_FILE" echo "2. Navigate to Dashboards and select your dashboard: $DASHBOARD_NAME" | tee -a "$LOG_FILE" echo "3. Choose Actions > Variables > Create a variable" | tee -a "$LOG_FILE" echo "4. Choose Property variable" | tee -a "$LOG_FILE" echo "5. For Property that the variable changes, choose FunctionName" | tee -a "$LOG_FILE" echo "6. For Input type, choose Select menu (dropdown)" | tee -a "$LOG_FILE" echo "7. Choose Use the results of a metric search" | tee -a "$LOG_FILE" echo "8. Choose Pre-built queries > Lambda > Errors" | tee -a "$LOG_FILE" echo "9. Choose By Function Name and then choose Search" | tee -a "$LOG_FILE" echo "10. (Optional) Configure any secondary settings as desired" | tee -a "$LOG_FILE" echo "11. Choose Add variable" | tee -a "$LOG_FILE" echo "" | tee -a "$LOG_FILE" echo "The dashboard has been created and can be accessed at:" | tee -a "$LOG_FILE" echo "https://console.aws.amazon.com/cloudwatch/home#dashboards:name=$DASHBOARD_NAME" | tee -a "$LOG_FILE" # Verify dashboard creation echo "Verifying dashboard creation..." | tee -a "$LOG_FILE" VERIFY_OUTPUT=$(log_cmd "aws cloudwatch get-dashboard --dashboard-name \"$DASHBOARD_NAME\"") check_error "$VERIFY_OUTPUT" $? "Failed to verify dashboard creation" echo "" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "DASHBOARD CREATED SUCCESSFULLY" | tee -a "$LOG_FILE" echo "==========================================" | tee -a "$LOG_FILE" echo "Dashboard Name: $DASHBOARD_NAME" | tee -a "$LOG_FILE" echo "Lambda Functions: $LAMBDA_FUNCTION1, $LAMBDA_FUNCTION2" | tee -a "$LOG_FILE" echo "" | tee -a "$LOG_FILE" echo "You can view your dashboard in the CloudWatch console:" | tee -a "$LOG_FILE" echo "https://console.aws.amazon.com/cloudwatch/home#dashboards:name=$DASHBOARD_NAME" | tee -a "$LOG_FILE" echo "" | tee -a "$LOG_FILE" # Prompt for cleanup confirm_cleanup echo "Script completed successfully." | tee -a "$LOG_FILE" exit 0-
For API details, see the following topics in AWS CLI Command Reference.
-