本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用 AWS FIS aws:ecs:task 動作
您可以使用 aws:ecs:task 動作,將故障注入您的 Amazon ECS任務。支援 Amazon EC2和 Fargate 容量類型。
這些動作使用 AWS Systems Manager (SSM) 文件來注入錯誤。若要使用 aws:ecs:task
動作,您需要將具有 SSM 代理程式的容器新增至您的 Amazon Elastic Container Service (Amazon ECS) 任務定義。容器會執行AWS FIS定義的指令碼,將 Amazon ECS任務註冊為SSM服務中的受管執行個體。此外,指令碼會擷取任務中繼資料,將標籤新增至受管執行個體。設定將允許 AWS FIS 解析目標任務。此段落參考下圖中的設定。
當您執行以 為目標的 AWS FIS實驗時aws:ecs:task
, AWS FIS 會使用資源標籤 ,將您在實驗範本中 AWS FIS指定的目標 Amazon ECS任務映射到一組SSM受管執行個體ECS_TASK_ARN
。標籤值是應執行SSM文件ARN的相關聯 Amazon ECS任務的 。此段落參考下圖中的 Fault Injection。
下圖示範具有一個現有容器的任務的設定和故障注入。
動作
限制
-
下列動作無法平行執行:
aws:ecs:task-network-blackhole-port
aws:ecs:task-network-latency
aws:ecs:task-network-packet-loss
-
如果您啟用 Amazon ECS Exec,您必須先停用它,才能使用這些動作。
-
即使實驗已完成狀態,SSM文件執行仍可能已取消狀態。執行 Amazon ECS動作時,會在實驗中的動作持續時間和 Amazon EC2 Systems Manager (SSM) 文件持續時間內使用客戶提供的持續時間。啟動動作後,SSM文件需要一些時間才能開始執行。因此,在達到指定的動作持續時間時,SSM文件可能仍有幾秒鐘的時間來完成其執行。達到實驗動作持續時間時,動作會停止,且SSM文件執行會取消。故障注入成功。
要求
-
將 AWS FIS下列許可新增至實驗角色:
ssm:SendCommand
ssm:ListCommands
ssm:CancelCommand
-
將下列許可新增至 Amazon ECS任務IAM角色:
ssm:CreateActivation
ssm:AddTagsToResource
iam:PassRole
請注意,您可以將受管執行個體角色ARN的 指定為 的資源
iam:PassRole
。 -
建立 Amazon ECS任務執行IAM角色並新增 A mazonECSTaskExecutionRolePolicy受管政策。
-
在任務定義中,將環境變數
MANAGED_INSTANCE_ROLE_NAME
設定為受管執行個體角色的名稱。這是將連接到在 中註冊為受管執行個體之任務的角色SSM。 -
將下列許可新增至受管執行個體角色:
ssm:DeleteActivation
ssm:DeregisterManagedInstance
-
將 A mazonSSMManagedInstanceCore受管政策新增至受管執行個體角色。
-
將SSM代理程式容器新增至 Amazon ECS任務定義。命令指令碼會將 Amazon ECS任務註冊為受管執行個體。
{ "name": "amazon-ssm-agent", "image": "public.ecr.aws/amazon-ssm-agent/amazon-ssm-agent:latest", "cpu": 0, "links": [], "portMappings": [], "essential": false, "entryPoint": [], "command": [ "/bin/bash", "-c", "set -e; dnf upgrade -y; dnf install jq procps awscli -y; term_handler() { echo \"Deleting SSM activation $ACTIVATION_ID\"; if ! aws ssm delete-activation --activation-id $ACTIVATION_ID --region $ECS_TASK_REGION; then echo \"SSM activation $ACTIVATION_ID failed to be deleted\" 1>&2; fi; MANAGED_INSTANCE_ID=$(jq -e -r .ManagedInstanceID /var/lib/amazon/ssm/registration); echo \"Deregistering SSM Managed Instance $MANAGED_INSTANCE_ID\"; if ! aws ssm deregister-managed-instance --instance-id $MANAGED_INSTANCE_ID --region $ECS_TASK_REGION; then echo \"SSM Managed Instance $MANAGED_INSTANCE_ID failed to be deregistered\" 1>&2; fi; kill -SIGTERM $SSM_AGENT_PID; }; trap term_handler SIGTERM SIGINT; if [[ -z $MANAGED_INSTANCE_ROLE_NAME ]]; then echo \"Environment variable MANAGED_INSTANCE_ROLE_NAME not set, exiting\" 1>&2; exit 1; fi; if ! ps ax | grep amazon-ssm-agent | grep -v grep > /dev/null; then if [[ -n $ECS_CONTAINER_METADATA_URI_V4 ]] ; then echo \"Found ECS Container Metadata, running activation with metadata\"; TASK_METADATA=$(curl \"${ECS_CONTAINER_METADATA_URI_V4}/task\"); ECS_TASK_AVAILABILITY_ZONE=$(echo $TASK_METADATA | jq -e -r '.AvailabilityZone'); ECS_TASK_ARN=$(echo $TASK_METADATA | jq -e -r '.TaskARN'); ECS_TASK_REGION=$(echo $ECS_TASK_AVAILABILITY_ZONE | sed 's/.$//'); ECS_TASK_AVAILABILITY_ZONE_REGEX='^(af|ap|ca|cn|eu|me|sa|us|us-gov)-(central|north|(north(east|west))|south|south(east|west)|east|west)-[0-9]{1}[a-z]{1}$'; if ! [[ $ECS_TASK_AVAILABILITY_ZONE =~ $ECS_TASK_AVAILABILITY_ZONE_REGEX ]]; then echo \"Error extracting Availability Zone from ECS Container Metadata, exiting\" 1>&2; exit 1; fi; ECS_TASK_ARN_REGEX='^arn:(aws|aws-cn|aws-us-gov):ecs:[a-z0-9-]+:[0-9]{12}:task/[a-zA-Z0-9_-]+/[a-zA-Z0-9]+$'; if ! [[ $ECS_TASK_ARN =~ $ECS_TASK_ARN_REGEX ]]; then echo \"Error extracting Task ARN from ECS Container Metadata, exiting\" 1>&2; exit 1; fi; CREATE_ACTIVATION_OUTPUT=$(aws ssm create-activation --iam-role $MANAGED_INSTANCE_ROLE_NAME --tags Key=ECS_TASK_AVAILABILITY_ZONE,Value=$ECS_TASK_AVAILABILITY_ZONE Key=ECS_TASK_ARN,Value=$ECS_TASK_ARN Key=FAULT_INJECTION_SIDECAR,Value=true --region $ECS_TASK_REGION); ACTIVATION_CODE=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationCode); ACTIVATION_ID=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationId); if ! amazon-ssm-agent -register -code $ACTIVATION_CODE -id $ACTIVATION_ID -region $ECS_TASK_REGION; then echo \"Failed to register with AWS Systems Manager (SSM), exiting\" 1>&2; exit 1; fi; amazon-ssm-agent & SSM_AGENT_PID=$!; wait $SSM_AGENT_PID; else echo \"ECS Container Metadata not found, exiting\" 1>&2; exit 1; fi; else echo \"SSM agent is already running, exiting\" 1>&2; exit 1; fi" ], "environment": [ { "name": "MANAGED_INSTANCE_ROLE_NAME", "value": "
SSMManagedInstanceRole
" } ], "environmentFiles": [], "mountPoints": [], "volumesFrom": [], "secrets": [], "dnsServers": [], "dnsSearchDomains": [], "extraHosts": [], "dockerSecurityOptions": [], "dockerLabels": {}, "ulimits": [], "logConfiguration": {}, "systemControls": [] }如需指令碼更易讀的版本,請參閱 指令碼的參考版本。
-
透過在 Amazon ECS ECS任務定義中設定
enableFaultInjection
欄位APIs來啟用 Amazon Fault Injection :"enableFaultInjection": true,
-
在 Fargate 任務上使用
aws:ecs:task-network-blackhole-port
aws:ecs:task-network-latency
、 或aws:ecs:task-network-packet-loss
動作時,該動作必須將useEcsFaultInjectionEndpoints
參數設定為true
。 -
使用
aws:ecs:task-kill-process
、aws:ecs:task-network-latency
、aws:ecs:task-network-blackhole-port
和aws:ecs:task-network-packet-loss
動作時,Amazon ECS任務定義必須pidMode
設定為task
。 -
使用
aws:ecs:task-network-blackhole-port
、aws:ecs:task-network-latency
和aws:ecs:task-network-packet-loss
動作時,Amazon ECS任務定義必須networkMode
設定為 以外的值bridge
。
指令碼的參考版本
以下是需求區段中更易讀的指令碼版本,供您參考。
#!/usr/bin/env bash # This is the activation script used to register ECS tasks as Managed Instances in SSM # The script retrieves information form the ECS task metadata endpoint to add three tags to the Managed Instance # - ECS_TASK_AVAILABILITY_ZONE: To allow customers to target Managed Instances / Tasks in a specific Availability Zone # - ECS_TASK_ARN: To allow customers to target Managed Instances / Tasks by using the Task ARN # - FAULT_INJECTION_SIDECAR: To make it clear that the tasks were registered as managed instance for fault injection purposes. Value is always 'true'. # The script will leave the SSM Agent running in the background # When the container running this script receives a SIGTERM or SIGINT signal, it will do the following cleanup: # - Delete SSM activation # - Deregister SSM managed instance set -e # stop execution instantly as a query exits while having a non-zero dnf upgrade -y dnf install jq procps awscli -y term_handler() { echo "Deleting SSM activation $ACTIVATION_ID" if ! aws ssm delete-activation --activation-id $ACTIVATION_ID --region $ECS_TASK_REGION; then echo "SSM activation $ACTIVATION_ID failed to be deleted" 1>&2 fi MANAGED_INSTANCE_ID=$(jq -e -r .ManagedInstanceID /var/lib/amazon/ssm/registration) echo "Deregistering SSM Managed Instance $MANAGED_INSTANCE_ID" if ! aws ssm deregister-managed-instance --instance-id $MANAGED_INSTANCE_ID --region $ECS_TASK_REGION; then echo "SSM Managed Instance $MANAGED_INSTANCE_ID failed to be deregistered" 1>&2 fi kill -SIGTERM $SSM_AGENT_PID } trap term_handler SIGTERM SIGINT # check if the required IAM role is provided if [[ -z $MANAGED_INSTANCE_ROLE_NAME ]] ; then echo "Environment variable MANAGED_INSTANCE_ROLE_NAME not set, exiting" 1>&2 exit 1 fi # check if the agent is already running (it will be if ECS Exec is enabled) if ! ps ax | grep amazon-ssm-agent | grep -v grep > /dev/null; then # check if ECS Container Metadata is available if [[ -n $ECS_CONTAINER_METADATA_URI_V4 ]] ; then # Retrieve info from ECS task metadata endpoint echo "Found ECS Container Metadata, running activation with metadata" TASK_METADATA=$(curl "${ECS_CONTAINER_METADATA_URI_V4}/task") ECS_TASK_AVAILABILITY_ZONE=$(echo $TASK_METADATA | jq -e -r '.AvailabilityZone') ECS_TASK_ARN=$(echo $TASK_METADATA | jq -e -r '.TaskARN') ECS_TASK_REGION=$(echo $ECS_TASK_AVAILABILITY_ZONE | sed 's/.$//') # validate ECS_TASK_AVAILABILITY_ZONE ECS_TASK_AVAILABILITY_ZONE_REGEX='^(af|ap|ca|cn|eu|me|sa|us|us-gov)-(central|north|(north(east|west))|south|south(east|west)|east|west)-[0-9]{1}[a-z]{1}$' if ! [[ $ECS_TASK_AVAILABILITY_ZONE =~ $ECS_TASK_AVAILABILITY_ZONE_REGEX ]] ; then echo "Error extracting Availability Zone from ECS Container Metadata, exiting" 1>&2 exit 1 fi # validate ECS_TASK_ARN ECS_TASK_ARN_REGEX='^arn:(aws|aws-cn|aws-us-gov):ecs:[a-z0-9-]+:[0-9]{12}:task/[a-zA-Z0-9_-]+/[a-zA-Z0-9]+$' if ! [[ $ECS_TASK_ARN =~ $ECS_TASK_ARN_REGEX ]] ; then echo "Error extracting Task ARN from ECS Container Metadata, exiting" 1>&2 exit 1 fi # Create activation tagging with Availability Zone and Task ARN CREATE_ACTIVATION_OUTPUT=$(aws ssm create-activation \ --iam-role $MANAGED_INSTANCE_ROLE_NAME \ --tags Key=ECS_TASK_AVAILABILITY_ZONE,Value=$ECS_TASK_AVAILABILITY_ZONE Key=ECS_TASK_ARN,Value=$ECS_TASK_ARN Key=FAULT_INJECTION_SIDECAR,Value=true \ --region $ECS_TASK_REGION) ACTIVATION_CODE=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationCode) ACTIVATION_ID=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationId) # Register with AWS Systems Manager (SSM) if ! amazon-ssm-agent -register -code $ACTIVATION_CODE -id $ACTIVATION_ID -region $ECS_TASK_REGION; then echo "Failed to register with AWS Systems Manager (SSM), exiting" 1>&2 exit 1 fi # the agent needs to run in the background, otherwise the trapped signal # won't execute the attached function until this process finishes amazon-ssm-agent & SSM_AGENT_PID=$! # need to keep the script alive, otherwise the container will terminate wait $SSM_AGENT_PID else echo "ECS Container Metadata not found, exiting" 1>&2 exit 1 fi else echo "SSM agent is already running, exiting" 1>&2 exit 1 fi
範例實驗範本
以下是 aws:ecs:task-cpu-stress動作的範例實驗範本。
{ "description": "Run CPU stress on the target ECS tasks", "targets": { "myTasks": { "resourceType": "aws:ecs:task", "resourceArns": [ "arn:aws:ecs:
us-east-1
:111122223333
:task/my-cluster
/09821742c0e24250b187dfed8EXAMPLE
" ], "selectionMode": "ALL
" } }, "actions": { "EcsTask-cpu-stress": { "actionId": "aws:ecs:task-cpu-stress", "parameters": { "duration": "PT1M
" }, "targets": { "Tasks": "myTasks" } } }, "stopConditions": [ { "source": "none", } ], "roleArn": "arn:aws:iam::111122223333
:role/fis-experiment-role
", "tags": {} }