# Amazon ECS task definition use cases Learn more about how to write task definitions for various AWS services and features. Depending on your workload, there are certain task definition parameters that need to be set. Also for EC2, you have to choose specific instances that are engineered for the workload. **Topics** + [ # Amazon ECS task definitions for GPU workloads ](ecs-gpu.md) + [ # Amazon ECS task definitions for video transcoding workloads ](ecs-vt1.md) + [ # Amazon ECS task definitions for AWS Neuron machine learning workloads ](ecs-inference.md) + [ # Amazon ECS task definitions for deep learning instances ](ecs-dl1.md) + [ # Amazon ECS task definitions for 64-bit ARM workloads ](ecs-arm64.md) + [ # Send Amazon ECS logs to CloudWatch ](using_awslogs.md) + [ # Send Amazon ECS logs to an AWS service or AWS Partner ](using_firelens.md) + [ # Using non-AWS container images in Amazon ECS ](private-auth.md) + [ # Restart individual containers in Amazon ECS tasks with container restart policies ](container-restart-policy.md) + [ # Pass sensitive data to an Amazon ECS container ](specifying-sensitive-data.md) # Amazon ECS task definitions for GPU workloads Amazon ECS supports workloads that use GPUs, when you create clusters with container instances that support GPUs. Amazon EC2 GPU-based container instances that use the p2, p3, p5, g3, g4, and g5 instance types provide access to NVIDIA GPUs. For more information, see [Linux Accelerated Computing Instances](https://docs.aws.amazon.com/ec2/latest/instancetypes/ac.html) in the *Amazon EC2 Instance Types guide*. Amazon ECS provides a GPU-optimized AMI that comes with pre-configured NVIDIA kernel drivers and a Docker GPU runtime. For more information, see [Amazon ECS-optimized Linux AMIs](ecs-optimized_AMI.md). You can designate a number of GPUs in your task definition for task placement consideration at a container level. Amazon ECS schedules to available container instances that support GPUs and pin physical GPUs to proper containers for optimal performance. The following Amazon EC2 GPU-based instance types are supported. For more information, see [Amazon EC2 P2 Instances](https://aws.amazon.com/ec2/instance-types/p2/), [Amazon EC2 P3 Instances](https://aws.amazon.com/ec2/instance-types/p3/), [Amazon EC2 P4d Instances](https://aws.amazon.com/ec2/instance-types/p4/), [Amazon EC2 P5 Instances](https://aws.amazon.com/ec2/instance-types/p5/), [Amazon EC2 G3 Instances](https://aws.amazon.com/ec2/instance-types/g3/), [Amazon EC2 G4 Instances](https://aws.amazon.com/ec2/instance-types/g4/), [Amazon EC2 G5 Instances](https://aws.amazon.com/ec2/instance-types/g5/), [Amazon EC2 G6 Instances](https://aws.amazon.com/ec2/instance-types/g6/), and [Amazon EC2 G6e Instances](https://aws.amazon.com/ec2/instance-types/g6e/). | Instance type | GPUs | GPU memory (GiB) | vCPUs | Memory (GiB) | | --- | --- | --- | --- | --- | | p3.2xlarge | 1 | 16 | 8 | 61 | | p3.8xlarge | 4 | 64 | 32 | 244 | | p3.16xlarge | 8 | 128 | 64 | 488 | | p3dn.24xlarge | 8 | 256 | 96 | 768 | | p4d.24xlarge | 8 | 320 | 96 | 1152 | | p5.48xlarge | 8 | 640 | 192 | 2048 | | g3s.xlarge | 1 | 8 | 4 | 30.5 | | g3.4xlarge | 1 | 8 | 16 | 122 | | g3.8xlarge | 2 | 16 | 32 | 244 | | g3.16xlarge | 4 | 32 | 64 | 488 | | g4dn.xlarge | 1 | 16 | 4 | 16 | | g4dn.2xlarge | 1 | 16 | 8 | 32 | | g4dn.4xlarge | 1 | 16 | 16 | 64 | | g4dn.8xlarge | 1 | 16 | 32 | 128 | | g4dn.12xlarge | 4 | 64 | 48 | 192 | | g4dn.16xlarge | 1 | 16 | 64 | 256 | | g5.xlarge | 1 | 24 | 4 | 16 | | g5.2xlarge | 1 | 24 | 8 | 32 | | g5.4xlarge | 1 | 24 | 16 | 64 | | g5.8xlarge | 1 | 24 | 32 | 128 | | g5.16xlarge | 1 | 24 | 64 | 256 | | g5.12xlarge | 4 | 96 | 48 | 192 | | g5.24xlarge | 4 | 96 | 96 | 384 | | g5.48xlarge | 8 | 192 | 192 | 768 | | g6.xlarge | 1 | 24 | 4 | 16 | | g6.2xlarge | 1 | 24 | 8 | 32 | | g6.4xlarge | 1 | 24 | 16 | 64 | | g6.8xlarge | 1 | 24 | 32 | 128 | | g6.16.xlarge | 1 | 24 | 64 | 256 | | g6.12xlarge | 4 | 96 | 48 | 192 | | g6.24xlarge | 4 | 96 | 96 | 384 | | g6.48xlarge | 8 | 192 | 192 | 768 | | g6.metal | 8 | 192 | 192 | 768 | | gr6.4xlarge | 1 | 24 | 16 | 128 | | g6e.xlarge | 1 | 48 | 4 | 32 | | g6e.2xlarge | 1 | 48 | 8 | 64 | | g6e.4xlarge | 1 | 48 | 16 | 128 | | g6e.8xlarge | 1 | 48 | 32 | 256 | | g6e16.xlarge | 1 | 48 | 64 | 512 | | g6e12.xlarge | 4 | 192 | 48 | 384 | | g6e24.xlarge | 4 | 192 | 96 | 768 | | g6e48.xlarge | 8 | 384 | 192 | 1536 | | gr6.8xlarge | 1 | 24 | 32 | 256 | You can retrieve the Amazon Machine Image (AMI) ID for Amazon ECS-optimized AMIs by querying the AWS Systems Manager Parameter Store API. Using this parameter, you don't need to manually look up Amazon ECS-optimized AMI IDs. For more information about the Systems Manager Parameter Store API, see [GetParameter](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParameter.html). The user that you use must have the `ssm:GetParameter` IAM permission to retrieve the Amazon ECS-optimized AMI metadata. ``` aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended --region us-east-1 ``` # Use GPUs with Amazon ECS Managed Instances Amazon ECS Managed Instances supports GPU-accelerated computing for workloads such as machine learning, high-performance computing, and video processing through the following Amazon EC2 instance types. For more information about instance types supported by Amazon ECS Managed Instances, see [Amazon ECS Managed Instances instance types](managed-instances-instance-types.md). The following is a subset of GPU-based instance types supported on Amazon ECS Managed Instances: + `g4dn`: Powered by NVIDIA T4 GPUs, suitable for machine learning inference, computer vision, and graphics-intensive applications. + `g5`: Powered by NVIDIA A10G GPUs, offering higher performance for graphics-intensive applications and machine learning workloads. + `p3`: Powered by NVIDIA V100 GPUs, designed for high-performance computing and deep learning training. + `p4d`: Powered by NVIDIA A100 GPUs, offering the highest performance for for machine learning training and high-performance computing. When you use GPU-enabled instance types with Amazon ECS Managed Instances, the NVIDIA drivers and CUDA toolkit are pre-installed on the instance, making it easier to run GPU-accelerated workloads. ## GPU-enabled instance selection To select GPU-enabled instance types for your Amazon ECS Managed Instances workloads, use the `instanceRequirements` object in the launch template of the capacity provider. The following snippet shows the attributes that can be used for selecting GPU-enabled instances. ``` { "instanceRequirements": { "acceleratorTypes": "gpu", "acceleratorCount": 1, "acceleratorManufacturers": ["nvidia"] } } ``` The following snippet shows the attributes that can be used to specify GPU-enabled instance types in the launch template. ``` { "instanceRequirements": { "allowedInstanceTypes": ["g4dn.xlarge", "p4de.24xlarge"] } } ``` ## GPU-enabled container images To use GPUs in your containers, you need to use container images that contain the necessary GPU libraries and tools. NVIDIA provides several pre-built container images that you can use as a base for your GPU workloads, including the following: + `nvidia:cuda`: Base images with the CUDA toolkit for GPU computing. + `tensorflow/tensorflow:latest-gpu`: TensorFlow with GPU support. + `pytorch/pytorch:latest-cuda`: PyTorch with GPU support. For an example task definition for Amazon ECS on Amazon ECS Managed Instances that involves the use of GPUs, see [Specifying GPUs in an Amazon ECS task definition](ecs-gpu-specifying.md). ## Considerations **Note** The support for g2 instance family type has been deprecated. The p2 instance family type is only supported on versions earlier than `20230912` of the Amazon ECS GPU-optimized AMI. If you need to continue to use p2 instances, see [What to do if you need a P2 instance](#p2-instance). In-place updates of the NVIDIA/CUDA drivers on both these instance family types will cause potential GPU workload failures. We recommend that you consider the following before you begin working with GPUs on Amazon ECS. + Your clusters can contain a mix of GPU and non-GPU container instances. + You can run GPU workloads on external instances. When registering an external instance with your cluster, ensure the `--enable-gpu` flag is included on the installation script. For more information, see [Registering an external instance to an Amazon ECS cluster](ecs-anywhere-registration.md). + You must set `ECS_ENABLE_GPU_SUPPORT` to `true` in your agent configuration file. For more information, see [Amazon ECS container agent configuration](ecs-agent-config.md). + When running a task or creating a service, you can use instance type attributes when you configure task placement constraints to determine the container instances the task is to be launched on. By doing this, you can more effectively use your resources. For more information, see [How Amazon ECS places tasks on container instances](task-placement.md). The following example launches a task on a `g4dn.xlarge` container instance in your default cluster. ``` aws ecs run-task --cluster default --task-definition ecs-gpu-task-def \ --placement-constraints type=memberOf,expression="attribute:ecs.instance-type == g4dn.xlarge" --region us-east-2 ``` + For each container that has a GPU resource requirement that's specified in the container definition, Amazon ECS sets the container runtime to be the NVIDIA container runtime. + The NVIDIA container runtime requires some environment variables to be set in the container to function properly. For a list of these environment variables, see [Specialized Configurations with Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html?highlight=environment%20variable). Amazon ECS sets the `NVIDIA_VISIBLE_DEVICES` environment variable value to be a list of the GPU device IDs that Amazon ECS assigns to the container. For the other required environment variables, Amazon ECS doesn't set them. So, make sure that your container image sets them or they're set in the container definition. + The p5 instance type family is supported on version `20230929` and later of the Amazon ECS GPU-optimized AMI. + The g4 instance type family is supported on version `20230913` and later of the Amazon ECS GPU-optimized AMI. For more information, see [Amazon ECS-optimized Linux AMIs](ecs-optimized_AMI.md). It's not supported in the Create Cluster workflow in the Amazon ECS console. To use these instance types, you must either use the Amazon EC2 console, AWS CLI, or API and manually register the instances to your cluster. + The p4d.24xlarge instance type only works with CUDA 11 or later. + The Amazon ECS GPU-optimized AMI has IPv6 enabled, which causes issues when using `yum`. This can be resolved by configuring `yum` to use IPv4 with the following command. ``` echo "ip_resolve=4" >> /etc/yum.conf ``` + When you build a container image that doesn't use the NVIDIA/CUDA base images, you must set the `NVIDIA_DRIVER_CAPABILITIES` container runtime variable to one of the following values: + `utility,compute` + `all` For information about how to set the variable, see [Controlling the NVIDIA Container Runtime](https://sarus.readthedocs.io/en/stable/user/custom-cuda-images.html#controlling-the-nvidia-container-runtime) on the NVIDIA website. + GPUs are not supported on Windows containers. # Launch a GPU container instance for Amazon ECS To use a GPU instance on Amazon ECS on Amazon EC2, you need to create a launch template, a user data file, and launch the instance. You can then run a task that uses a task definition configured for GPU. ## Use a launch template You can create a launch template. + Create a launch template that uses the Amazon ECS-optimized GPU AMI ID For the AMI. For information about how to create a launch template, see [Create a new launch template using parameters you define](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-launch-template.html#create-launch-template-define-parameters) in the *Amazon EC2 User Guide*. Use the AMI ID from the previous step for the **Amazon Machine image**. For information about how to specify the AMI ID with the Systems Manager parameter, see [Specify a Systems Manager parameter in a launch template](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-launch-template.html#use-an-ssm-parameter-instead-of-an-ami-id) in the *Amazon EC2 User Guide*. Add the following to the **User data** in the launch template. Replace *cluster-name* with the name of your cluster. ``` #!/bin/bash echo ECS_CLUSTER=cluster-name >> /etc/ecs/ecs.config; echo ECS_ENABLE_GPU_SUPPORT=true >> /etc/ecs/ecs.config ``` ## Use the AWS CLI You can use the AWS CLI to launch the container instance. 1. Create a file that's called `userdata.toml`. This file is used for the instance user data. Replace *cluster-name* with the name of your cluster. ``` #!/bin/bash echo ECS_CLUSTER=cluster-name >> /etc/ecs/ecs.config; echo ECS_ENABLE_GPU_SUPPORT=true >> /etc/ecs/ecs.config ``` 1. Run the following command to get the GPU AMI ID. You use this in the following step. ``` aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended --region us-east-1 ``` 1. Run the following command to launch the GPU instance. Remember to replace the following parameters: + Replace *subnet* with the ID of the private or public subnet that your instance will launch in. + Replace *gpu\$1ami* with the AMI ID from the previous step. + Replace *t3.large* with the instance type that you want to use. + Replace *region* with the Region code. ``` aws ec2 run-instances --key-name ecs-gpu-example \ --subnet-id subnet \ --image-id gpu_ami \ --instance-type t3.large \ --region region \ --tag-specifications 'ResourceType=instance,Tags=[{Key=GPU,Value=example}]' \ --user-data file://userdata.toml \ --iam-instance-profile Name=ecsInstanceRole ``` 1. Run the following command to verify that the container instance is registered to the cluster. When you run this command, remember to replace the following parameters: + Replace *cluster* with your cluster name. + Replace *region* with your Region code. ``` aws ecs list-container-instances --cluster cluster-name --region region ``` # Specifying GPUs in an Amazon ECS task definition To use the GPUs on a container instance and the Docker GPU runtime, make sure that you designate the number of GPUs your container requires in the task definition. As containers that support GPUs are placed, the Amazon ECS container agent pins the desired number of physical GPUs to the appropriate container. The number of GPUs reserved for all containers in a task cannot exceed the number of available GPUs on the container instance the task is launched on. For more information, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). **Important** If your GPU requirements aren't specified in the task definition, the task uses the default Docker runtime. The following shows the JSON format for the GPU requirements in a task definition: ``` { "containerDefinitions": [ { ... "resourceRequirements" : [ { "type" : "GPU", "value" : "2" } ], }, ... } ``` The following example demonstrates the syntax for a Docker container that specifies a GPU requirement. This container uses two GPUs, runs the `nvidia-smi` utility, and then exits. ``` { "containerDefinitions": [ { "memory": 80, "essential": true, "name": "gpu", "image": "nvidia/cuda:11.0.3-base", "resourceRequirements": [ { "type":"GPU", "value": "2" } ], "command": [ "sh", "-c", "nvidia-smi" ], "cpu": 100 } ], "family": "example-ecs-gpu" } ``` The following example task definition shows a TensorFlow container that prints the number of available GPUs. The task runs on Amazon ECS Managed Instances, requires one GPU, and uses a `g4dn.xlarge` instance. ``` { "family": "tensorflow-gpu", "networkMode": "awsvpc", "executionRoleArn": "arn:aws:iam::account-id:role/ecsTaskExecutionRole", "containerDefinitions": [ { "name": "tensorflow", "image": "tensorflow/tensorflow:latest-gpu", "essential": true, "command": [ "python", "-c", "import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))" ], "resourceRequirements": [ { "type": "GPU", "value": "1" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/tensorflow-gpu", "awslogs-region": "region", "awslogs-stream-prefix": "ecs" } } } ], "requiresCompatibilities": [ "MANAGED_INSTANCES" ], "cpu": "4096", "memory": "8192", } ``` ## Share GPUs When you want to share GPUs, you need to configure the following. 1. Remove GPU resource requirements from your task definitions so that Amazon ECS does not reserve any GPUs that should be shared. 1. Add the following user data to your instances when you want to share GPUs. This will make nvidia the default Docker container runtime on the container instance so that all Amazon ECS containers can use the GPUs. For more information see [Run commands when you launch an EC2 instance with user data input](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) in the *Amazon EC2 User Guide*. ``` const userData = ec2.UserData.forLinux(); userData.addCommands( 'sudo rm /etc/sysconfig/docker', 'echo DAEMON_MAXFILES=1048576 | sudo tee -a /etc/sysconfig/docker', 'echo OPTIONS="--default-ulimit nofile=32768:65536 --default-runtime nvidia" | sudo tee -a /etc/sysconfig/docker', 'echo DAEMON_PIDFILE_TIMEOUT=10 | sudo tee -a /etc/sysconfig/docker', 'sudo systemctl restart docker', ); ``` 1. Set the `NVIDIA_VISIBLE_DEVICES` environment variable on your container. You can do this by specifying the environment variable in your task definition. For information on the valid values, see [GPU Enumeration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#gpu-enumeration) on the NVIDIA documentation site. ## What to do if you need a P2 instance If you need to use P2 instance, you can use one of the following options to continue using the instances. You must modify the instance user data for both options. For more information see [Run commands when you launch an EC2 instance with user data input](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) in the *Amazon EC2 User Guide*. **Use the last supported GPU-optimized AMI** You can use the `20230906` version of the GPU-optimized AMI, and add the following to the instance user data. Replace cluster-name with the name of your cluster. ``` #!/bin/bash echo "exclude=*nvidia* *cuda*" >> /etc/yum.conf echo "ECS_CLUSTER=cluster-name" >> /etc/ecs/ecs.config ``` **Use the latest GPU-optimized AMI, and update the user data** You can add the following to the instance user data. This uninstalls the Nvidia 535/Cuda12.2 drivers, and then installs the Nvidia 470/Cuda11.4 drivers and fixes the version. ``` #!/bin/bash yum remove -y cuda-toolkit* nvidia-driver-latest-dkms* tmpfile=$(mktemp) cat >$tmpfile <> /etc/yum.conf nvidia-smi ``` **Create your own P2 compatible GPU-optimized AMI** You can create your own custom Amazon ECS GPU-optimized AMI that is compatible with P2 instances, and then launch P2 instances using the AMI. 1. Run the following command to clone the `amazon-ecs-ami repo`. ``` git clone https://github.com/aws/amazon-ecs-ami ``` 1. Set the required Amazon ECS agent and source Amazon Linux AMI versions in `release.auto.pkrvars.hcl` or `overrides.auto.pkrvars.hcl`. 1. Run the following command to build a private P2 compatible EC2 AMI. Replace region with the Region with the instance Region. ``` REGION=region make al2keplergpu ``` 1. Use the AMI with the following instance user data to connect to the Amazon ECS cluster. Replace cluster-name with the name of your cluster. ``` #!/bin/bash echo "ECS_CLUSTER=cluster-name" >> /etc/ecs/ecs.config ``` # Amazon ECS task definitions for video transcoding workloads To use video transcoding workloads on Amazon ECS, register [Amazon EC2 VT1](https://aws.amazon.com/ec2/instance-types/vt1/) instances. After you registered these instances, you can run live and pre-rendered video transcoding workloads as tasks on Amazon ECS. Amazon EC2 VT1 instances use Xilinx U30 media transcoding cards to accelerate live and pre-rendered video transcoding workloads. **Note** For instructions on how to run video transcoding workloads in containers other than Amazon ECS, see the [Xilinx documentation](https://xilinx.github.io/video-sdk/v1.5/container_setup.html#working-with-docker-vt1). ## Considerations Before you begin deploying VT1 on Amazon ECS, consider the following: + Your clusters can contain a mix of VT1 and non-VT1 instances. + You need a Linux application that uses Xilinx U30 media transcoding cards with accelerated AVC (H.264) and HEVC (H.265) codecs. **Important** Applications that use other codecs might not have improved performance on VT1 instances. + Only one transcoding task can run on a U30 card. Each card has two devices that are associated with it. You can run as many transcoding tasks as there are cards for each of your VT1 instance. + When creating a service or running a standalone task, you can use instance type attributes when configuring task placement constraints. This ensures that the task is launched on the container instance that you specify. Doing so helps ensure that you use your resources effectively and that your tasks for video transcoding workloads are on your VT1 instances. For more information, see [How Amazon ECS places tasks on container instances](task-placement.md). In the following example, a task is run on a `vt1.3xlarge` instance on your `default` cluster. ``` aws ecs run-task \ --cluster default \ --task-definition vt1-3xlarge-xffmpeg-processor \ --placement-constraints type=memberOf,expression="attribute:ecs.instance-type == vt1.3xlarge" ``` + You configure a container to use the specific U30 card available on the host container instance. You can do this by using the `linuxParameters` parameter and specifying the device details. For more information, see [Task definition requirements](#ecs-vt1-requirements). ## Using a VT1 AMI You have two options for running an AMI on Amazon EC2 for Amazon ECS container instances. The first option is to use the Xilinx official AMI on the AWS Marketplace. The second option is to build your own AMI from the sample repository. + [Xilinx offers AMIs on the AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-phvk6d4mq3hh6). + Amazon ECS provides a sample repository that you can use to build an AMI for video transcoding workloads. This AMI comes with Xilinx U30 drivers. You can find the repository that contains Packer scripts on [GitHub](https://github.com/aws-samples/aws-vt-baseami-pipeline). For more information about Packer, see the [Packer documentation](https://developer.hashicorp.com/packer/docs). ## Task definition requirements To run video transcoding containers on Amazon ECS, your task definition must contain a video transcoding application that uses the accelerated H.264/AVC and H.265/HEVC codecs. You can build a container image by following the steps on the [Xilinx GitHub](https://xilinx.github.io/video-sdk/v1.5/container_setup.html#creating-a-docker-image-for-vt1-usage). The task definition must be specific to the instance type. The instance types are 3xlarge, 6xlarge, and 24xlarge. You must configure a container to use specific Xilinx U30 devices that are available on the host container instance. You can do so using the `linuxParameters` parameter. The following table details the cards and device SoCs that are specific to each instance type. | Instance Type | vCPUs | RAM (GiB) | U30 accelerator cards | Addressable XCU30 SoC devices | Device Paths | | --- | --- | --- | --- | --- | --- | | vt1.3xlarge | 12 | 24 | 1 | 2 | /dev/dri/renderD128,/dev/dri/renderD129 | | vt1.6xlarge | 24 | 48 | 2 | 4 | /dev/dri/renderD128,/dev/dri/renderD129,/dev/dri/renderD130,/dev/dri/renderD131 | | vt1.24xlarge | 96 | 182 | 8 | 16 | /dev/dri/renderD128,/dev/dri/renderD129,/dev/dri/renderD130,/dev/dri/renderD131,/dev/dri/renderD132,/dev/dri/renderD133,/dev/dri/renderD134,/dev/dri/renderD135,/dev/dri/renderD136,/dev/dri/renderD137,/dev/dri/renderD138,/dev/dri/renderD139,/dev/dri/renderD140,/dev/dri/renderD141,/dev/dri/renderD142,/dev/dri/renderD143 | **Important** If the task definition lists devices that the EC2 instance doesn't have, the task fails to run. When the task fails, the following error message appears in the `stoppedReason`: `CannotStartContainerError: Error response from daemon: error gathering device information while adding custom device "/dev/dri/renderD130": no such file or directory`. # Specifying video transcoding in an Amazon ECS task definition In the following example, the syntax that's used for a task definition of a Linux container on Amazon EC2 is provided. This task definition is for container images that are built following the procedure that's provided in the [Xilinx documentation](https://xilinx.github.io/video-sdk/v1.5/container_setup.html#creating-a-docker-image-for-vt1-usage). If you use this example, replace `image` with your own image, and copy your video files into the instance in the `/home/ec2-user` directory. ------ #### [ vt1.3xlarge ] 1. Create a text file that's named `vt1-3xlarge-ffmpeg-linux.json` with the following content. ``` { "family": "vt1-3xlarge-xffmpeg-processor", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == vt1.3xlarge" } ], "containerDefinitions": [ { "entryPoint": [ "/bin/bash", "-c" ], "command": ["/video/ecs_ffmpeg_wrapper.sh"], "linuxParameters": { "devices": [ { "containerPath": "/dev/dri/renderD128", "hostPath": "/dev/dri/renderD128", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD129", "hostPath": "/dev/dri/renderD129", "permissions": [ "read", "write" ] } ] }, "mountPoints": [ { "containerPath": "/video", "sourceVolume": "video_file" } ], "cpu": 0, "memory": 12000, "image": "0123456789012.dkr.ecr.us-west-2.amazonaws.com/aws/xilinx-xffmpeg", "essential": true, "name": "xilinix-xffmpeg" } ], "volumes": [ { "name": "video_file", "host": {"sourcePath": "/home/ec2-user"} } ] } ``` 1. Register the task definition. ``` aws ecs register-task-definition --family vt1-3xlarge-xffmpeg-processor --cli-input-json file://vt1-3xlarge-xffmpeg-linux.json --region us-east-1 ``` ------ #### [ vt1.6xlarge ] 1. Create a text file that's named `vt1-6xlarge-ffmpeg-linux.json` with the following content. ``` { "family": "vt1-6xlarge-xffmpeg-processor", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == vt1.6xlarge" } ], "containerDefinitions": [ { "entryPoint": [ "/bin/bash", "-c" ], "command": ["/video/ecs_ffmpeg_wrapper.sh"], "linuxParameters": { "devices": [ { "containerPath": "/dev/dri/renderD128", "hostPath": "/dev/dri/renderD128", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD129", "hostPath": "/dev/dri/renderD129", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD130", "hostPath": "/dev/dri/renderD130", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD131", "hostPath": "/dev/dri/renderD131", "permissions": [ "read", "write" ] } ] }, "mountPoints": [ { "containerPath": "/video", "sourceVolume": "video_file" } ], "cpu": 0, "memory": 12000, "image": "0123456789012.dkr.ecr.us-west-2.amazonaws.com/aws/xilinx-xffmpeg", "essential": true, "name": "xilinix-xffmpeg" } ], "volumes": [ { "name": "video_file", "host": {"sourcePath": "/home/ec2-user"} } ] } ``` 1. Register the task definition. ``` aws ecs register-task-definition --family vt1-6xlarge-xffmpeg-processor --cli-input-json file://vt1-6xlarge-xffmpeg-linux.json --region us-east-1 ``` ------ #### [ vt1.24xlarge ] 1. Create a text file that's named `vt1-24xlarge-ffmpeg-linux.json` with the following content. ``` { "family": "vt1-24xlarge-xffmpeg-processor", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == vt1.24xlarge" } ], "containerDefinitions": [ { "entryPoint": [ "/bin/bash", "-c" ], "command": ["/video/ecs_ffmpeg_wrapper.sh"], "linuxParameters": { "devices": [ { "containerPath": "/dev/dri/renderD128", "hostPath": "/dev/dri/renderD128", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD129", "hostPath": "/dev/dri/renderD129", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD130", "hostPath": "/dev/dri/renderD130", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD131", "hostPath": "/dev/dri/renderD131", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD132", "hostPath": "/dev/dri/renderD132", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD133", "hostPath": "/dev/dri/renderD133", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD134", "hostPath": "/dev/dri/renderD134", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD135", "hostPath": "/dev/dri/renderD135", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD136", "hostPath": "/dev/dri/renderD136", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD137", "hostPath": "/dev/dri/renderD137", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD138", "hostPath": "/dev/dri/renderD138", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD139", "hostPath": "/dev/dri/renderD139", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD140", "hostPath": "/dev/dri/renderD140", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD141", "hostPath": "/dev/dri/renderD141", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD142", "hostPath": "/dev/dri/renderD142", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/dri/renderD143", "hostPath": "/dev/dri/renderD143", "permissions": [ "read", "write" ] } ] }, "mountPoints": [ { "containerPath": "/video", "sourceVolume": "video_file" } ], "cpu": 0, "memory": 12000, "image": "0123456789012.dkr.ecr.us-west-2.amazonaws.com/aws/xilinx-xffmpeg", "essential": true, "name": "xilinix-xffmpeg" } ], "volumes": [ { "name": "video_file", "host": {"sourcePath": "/home/ec2-user"} } ] } ``` 1. Register the task definition. ``` aws ecs register-task-definition --family vt1-24xlarge-xffmpeg-processor --cli-input-json file://vt1-24xlarge-xffmpeg-linux.json --region us-east-1 ``` ------ # Amazon ECS task definitions for AWS Neuron machine learning workloads You can register [Amazon EC2 Trn1](https://aws.amazon.com/ec2/instance-types/trn1/), [Amazon EC2 Trn2](https://aws.amazon.com/ec2/instance-types/trn2/), [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/), and [Amazon EC2 Inf2](https://aws.amazon.com/ec2/instance-types/inf2/) instances to your clusters for machine learning workloads. Amazon EC2 Trn1 and Trn2 instances are powered by [AWS Trainium](https://aws.amazon.com/ai/machine-learning/trainium/) chips. These instances provide high performance and low cost training for machine learning in the cloud. You can train a machine learning inference model using a machine learning framework with AWS Neuron on a Trn1 or Trn2 instance. Then, you can run the model on a Inf1 instance, or an Inf2 instance to use the acceleration of the AWS Inferentia chips. The Amazon EC2 Inf1 instances and Inf2 instances are powered by [AWS Inferentia](https://aws.amazon.com/ai/machine-learning/inferentia/) chips They provide high performance and lowest cost inference in the cloud. Machine learning models are deployed to containers using [AWS Neuron](https://aws.amazon.com/ai/machine-learning/neuron/), which is a specialized Software Developer Kit (SDK). The SDK consists of a compiler, runtime, and profiling tools that optimize the machine learning performance of AWS machine learning chips. AWS Neuron supports popular machine learning frameworks such as TensorFlow, PyTorch, and Apache MXNet. ## Considerations Before you begin deploying Neuron on Amazon ECS, consider the following: + Your clusters can contain a mix of Trn1, Trn2, Inf1, Inf2 and other instances. + You need a Linux application in a container that uses a machine learning framework that supports AWS Neuron. **Important** Applications that use other frameworks might not have improved performance on Trn1, Trn2, Inf1, and Inf2 instances. + Only one inference or inference-training task can run on each [AWS Trainium](https://aws.amazon.com/ai/machine-learning/trainium/) or [AWS Inferentia](https://aws.amazon.com/ai/machine-learning/inferentia/) chip. For Inf1, each chip has 4 NeuronCores. For Trn1, Trn2, and Inf2 each chip has 2 NeuronCores. You can run as many tasks as there are chips for each of your Trn1, Trn2, Inf1, and Inf2 instances. + When creating a service or running a standalone task, you can use instance type attributes when you configure task placement constraints. This ensures that the task is launched on the container instance that you specify. Doing so can help you optimize overall resource utilization and ensure that tasks for inference workloads are on your Trn1, Trn2, Inf1, and Inf2 instances. For more information, see [How Amazon ECS places tasks on container instances](task-placement.md). In the following example, a task is run on an `Inf1.xlarge` instance on your `default` cluster. ``` aws ecs run-task \ --cluster default \ --task-definition ecs-inference-task-def \ --placement-constraints type=memberOf,expression="attribute:ecs.instance-type == Inf1.xlarge" ``` + Neuron resource requirements can't be defined in a task definition. Instead, you configure a container to use specific AWS Trainium or AWS Inferentia chips available on the host container instance. Do this by using the `linuxParameters` parameter and specifying the device details. For more information, see [Task definition requirements](#ecs-inference-requirements). ## Use the Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI Amazon ECS provides an Amazon ECS optimized AMI that's based on Amazon Linux 2023 for AWS Trainium and AWS Inferentia workloads. It comes with the AWS Neuron drivers and runtime for Docker. This AMI makes running machine learning inference workloads easier on Amazon ECS. We recommend using the Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI when launching your Amazon EC2 Trn1, Inf1, and Inf2 instances. You can retrieve the current Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI using the AWS CLI with the following command. ``` aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/neuron/recommended ``` ## Task definition requirements To deploy Neuron on Amazon ECS, your task definition must contain the container definition for a pre-built container serving the inference model for TensorFlow. It's provided by AWS Deep Learning Containers. This container contains the AWS Neuron runtime and the TensorFlow Serving application. At startup, this container fetches your model from Amazon S3, launches Neuron TensorFlow Serving with the saved model, and waits for prediction requests. In the following example, the container image has TensorFlow 1.15 and Ubuntu 18.04. A complete list of pre-built Deep Learning Containers optimized for Neuron is maintained on GitHub. For more information, see [Using AWS Neuron TensorFlow Serving](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-tf-neuron-serving.html). ``` 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04 ``` Alternatively, you can build your own Neuron sidecar container image. For more information, see [Tutorial: Neuron TensorFlow Serving](https://github.com/aws-neuron/aws-neuron-sdk/blob/master/frameworks/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities.rst) in the *AWS Deep Learning AMIs Developer Guide*. The task definition must be specific to a single instance type. You must configure a container to use specific AWS Trainium or AWS Inferentia devices that are available on the host container instance. You can do so using the `linuxParameters` parameter. For a sample task definition, see [Specifying AWS Neuron machine learning in an Amazon ECS task definition](ecs-inference-task-def.md). The following table details the chips that are specific to each instance type. | Instance Type | vCPUs | RAM (GiB) | AWS ML accelerator chips | Device Paths | | --- | --- | --- | --- | --- | | trn1.2xlarge | 8 | 32 | 1 | /dev/neuron0 | | trn1.32xlarge | 128 | 512 | 16 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3, /dev/neuron4, /dev/neuron5, /dev/neuron6, /dev/neuron7, /dev/neuron8, /dev/neuron9, /dev/neuron10, /dev/neuron11, /dev/neuron12, /dev/neuron13, /dev/neuron14, /dev/neuron15 | | trn2.48xlarge | 192 | 1536 | 16 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3, /dev/neuron4, /dev/neuron5, /dev/neuron6, /dev/neuron7, /dev/neuron8, /dev/neuron9, /dev/neuron10, /dev/neuron11, /dev/neuron12, /dev/neuron13, /dev/neuron14, /dev/neuron15 | | inf1.xlarge | 4 | 8 | 1 | /dev/neuron0 | | inf1.2xlarge | 8 | 16 | 1 | /dev/neuron0 | | inf1.6xlarge | 24 | 48 | 4 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3 | | inf1.24xlarge | 96 | 192 | 16 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3, /dev/neuron4, /dev/neuron5, /dev/neuron6, /dev/neuron7, /dev/neuron8, /dev/neuron9, /dev/neuron10, /dev/neuron11, /dev/neuron12, /dev/neuron13, /dev/neuron14, /dev/neuron15 | | inf2.xlarge | 8 | 16 | 1 | /dev/neuron0 | | inf2.8xlarge | 32 | 64 | 1 | /dev/neuron0 | | inf2.24xlarge | 96 | 384 | 6 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3, /dev/neuron4, /dev/neuron5, | | inf2.48xlarge | 192 | 768 | 12 | /dev/neuron0, /dev/neuron1, /dev/neuron2, /dev/neuron3, /dev/neuron4, /dev/neuron5, /dev/neuron6, /dev/neuron7, /dev/neuron8, /dev/neuron9, /dev/neuron10, /dev/neuron11 | # Specifying AWS Neuron machine learning in an Amazon ECS task definition The following is an example Linux task definition for `inf1.xlarge`, displaying the syntax to use. ``` { "family": "ecs-neuron", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == inf1.xlarge" } ], "executionRoleArn": "${YOUR_EXECUTION_ROLE}", "containerDefinitions": [ { "entryPoint": [ "/usr/local/bin/entrypoint.sh", "--port=8500", "--rest_api_port=9000", "--model_name=resnet50_neuron", "--model_base_path=s3://amzn-s3-demo-bucket/resnet50_neuron/" ], "portMappings": [ { "hostPort": 8500, "protocol": "tcp", "containerPort": 8500 }, { "hostPort": 8501, "protocol": "tcp", "containerPort": 8501 }, { "hostPort": 0, "protocol": "tcp", "containerPort": 80 } ], "linuxParameters": { "devices": [ { "containerPath": "/dev/neuron0", "hostPath": "/dev/neuron0", "permissions": [ "read", "write" ] } ], "capabilities": { "add": [ "IPC_LOCK" ] } }, "cpu": 0, "memoryReservation": 1000, "image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04", "essential": true, "name": "resnet50" } ] } ``` # Amazon ECS task definitions for deep learning instances To use deep learning workloads on Amazon ECS, register [Amazon EC2 DL1](https://aws.amazon.com/ec2/instance-types/dl1/) instances to your clusters. Amazon EC2 DL1 instances are powered by Gaudi accelerators from Habana Labs (an Intel company). Use the Habana SynapseAI SDK to connect to the Habana Gaudi accelerators. The SDK supports the popular machine learning frameworks, TensorFlow and PyTorch. ## Considerations Before you begin deploying DL1 on Amazon ECS, consider the following: + Your clusters can contain a mix of DL1 and non-DL1 instances. + When creating a service or running a standalone task, you can use instance type attributes specifically when you configure task placement constraints to ensure that your task is launched on the container instance that you specify. Doing so ensures that your resources are used effectively and that your tasks for deep learning workloads are on your DL1 instances. For more information, see [How Amazon ECS places tasks on container instances](task-placement.md). The following example runs a task on a `dl1.24xlarge` instance on your `default` cluster. ``` aws ecs run-task \ --cluster default \ --task-definition ecs-dl1-task-def \ --placement-constraints type=memberOf,expression="attribute:ecs.instance-type == dl1.24xlarge" ``` ## Using a DL1 AMI You have three options for running an AMI on Amazon EC2 DL1 instances for Amazon ECS: + AWS Marketplace AMIs that are provided by Habana [here](https://aws.amazon.com/marketplace/pp/prodview-h24gzbgqu75zq). + Habana Deep Learning AMIs that are provided by Amazon Web Services. Because it's not included, you need to install the Amazon ECS container agent separately. + Use Packer to build a custom AMI that's provided by the [GitHub repo](https://github.com/aws-samples/aws-habana-baseami-pipeline). For more information, see [the Packer documentation](https://developer.hashicorp.com/packer/docs). # Specifying deep learning in an Amazon ECS task definition To run Habana Gaudi accelerated deep learning containers on Amazon ECS, your task definition must contain the container definition for a pre-built container that serves the deep learning model for TensorFlow or PyTorch using Habana SynapseAI that's provided by AWS Deep Learning Containers. The following container image has TensorFlow 2.7.0 and Ubuntu 20.04. A complete list of pre-built Deep Learning Containers that's optimized for the Habana Gaudi accelerators is maintained on GitHub. For more information, see [Habana Training Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#habana-training-containers). ``` 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training-habana:2.7.0-hpu-py38-synapseai1.2.0-ubuntu20.04 ``` The following is an example task definition for Linux containers on Amazon EC2, displaying the syntax to use. This example uses an image containing the Habana Labs System Management Interface Tool (HL-SMI) found here: `vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614` ``` { "family": "dl-test", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == dl1.24xlarge" } ], "networkMode": "host", "cpu": "10240", "memory": "1024", "containerDefinitions": [ { "entryPoint": [ "sh", "-c" ], "command": ["hl-smi"], "cpu": 8192, "environment": [ { "name": "HABANA_VISIBLE_DEVICES", "value": "all" } ], "image": "vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614", "essential": true, "name": "tensorflow-installer-tf-hpu" } ] } ``` # Amazon ECS task definitions for 64-bit ARM workloads Amazon ECS supports using 64-bit ARM applications. You can run your applications on the platform that's powered by [AWS Graviton Processors](https://aws.amazon.com/ec2/graviton/). It's suitable for a wide variety of workloads. This includes workloads such as application servers, micro-services, high-performance computing, CPU-based machine learning inference, video encoding, electronic design automation, gaming, open-source databases, and in-memory caches. ## Considerations Before you begin deploying task definitions that use the 64-bit ARM architecture, consider the following: + The applications can use the Fargate or EC2s. + The applications can only use the Linux operating system. + For the Fargate type, the applications must use Fargate platform version `1.4.0` or later. + The applications can use Fluent Bit or CloudWatch for monitoring. + For the Fargate, the following AWS Regions do not support 64-bit ARM workloads: + US East (N. Virginia), the `use1-az3` Availability Zone + For the EC2, see the following to verify that the Region that you're in supports the instance type you want to use: + [Amazon EC2 M6g Instances](https://aws.amazon.com/ec2/instance-types/m6) + [Amazon EC2 T4g Instances](https://aws.amazon.com/ec2/instance-types/t4/) + [Amazon EC2 C6g Instances](https://aws.amazon.com/ec2/instance-types/c6g/) + [Amazon EC2 R6gd Instances](https://aws.amazon.com/ec2/instance-types/r6/) + [Amazon EC2 X2gd Instances](https://aws.amazon.com/ec2/instance-types/x2/) You can also use the Amazon EC2 `describe-instance-type-offerings` command with a filter to view the instance offering for your Region. ``` aws ec2 describe-instance-type-offerings --filters Name=instance-type,Values=instance-type --region region ``` The following example checks for the M6 instance type availability in the US East (N. Virginia) (us-east-1) Region. ``` aws ec2 describe-instance-type-offerings --filters "Name=instance-type,Values=m6*" --region us-east-1 ``` For more information, see [describe-instance-type-offerings ](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instance-type-offerings.html)in the *Amazon EC2 Command Line Reference*. # Specifying the ARM architecture in an Amazon ECS task definition To use the ARM architecture, specify `ARM64` for the `cpuArchitecture` task definition parameter. In the following example, the ARM architecture is specified in a task definition. It's in JSON format. ``` { "runtimePlatform": { "operatingSystemFamily": "LINUX", "cpuArchitecture": "ARM64" }, ... } ``` In the following example, a task definition for the ARM architecture displays "hello world." ``` { "family": "arm64-testapp", "networkMode": "awsvpc", "containerDefinitions": [ { "name": "arm-container", "image": "public.ecr.aws/docker/library/busybox:latest", "cpu": 100, "memory": 100, "essential": true, "command": [ "echo hello world" ], "entryPoint": [ "sh", "-c" ] } ], "requiresCompatibilities": [ "EC2" ], "cpu": "256", "memory": "512", "runtimePlatform": { "operatingSystemFamily": "LINUX", "cpuArchitecture": "ARM64" }, "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole" } ``` # Send Amazon ECS logs to CloudWatch You can configure the containers in your tasks to send log information to CloudWatch Logs. If you're using Fargate for your tasks, you can view the logs from your containers. If you're using EC2, you can view different logs from your containers in one convenient location, and it prevents your container logs from taking up disk space on your container instances. **Note** The type of information that is logged by the containers in your task depends mostly on their `ENTRYPOINT` command. By default, the logs that are captured show the command output that you typically might see in an interactive terminal if you ran the container locally, which are the `STDOUT` and `STDERR` I/O streams. The `awslogs` log driver simply passes these logs from Docker to CloudWatch Logs. For more information about how Docker logs are processed, including alternative ways to capture different file data or streams, see [View logs for a container or service](https://docs.docker.com/engine/logging/) in the Docker documentation. To send system logs from your Amazon ECS container instances to CloudWatch Logs, see [Monitoring Log Files](https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatchLogs.html) and [CloudWatch Logs quotas](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html) in the *Amazon CloudWatch Logs User Guide*. ## Fargate If you're using Fargate for your tasks, you need to add the required `logConfiguration` parameters to your task definition to turn on the `awslogs` log driver. For more information, see [Example Amazon ECS task definition: Route logs to CloudWatch](specify-log-config.md). For Windows container on Fargate, perform one of the following options when any of your task definition parameters have special characters such as, `& \ < > ^ |`: + Add an escape (`\`) with double quotes around the entire parameter string Example ``` "awslogs-multiline-pattern": "\"^[|DEBUG|INFO|WARNING|ERROR\"", ``` + Add an escape (`^`) character around each special character Example ``` "awslogs-multiline-pattern": "^^[^|DEBUG^|INFO^|WARNING^|ERROR", ``` ## EC2 If you're using EC2 for your tasks and want to turn on the `awslogs` log driver, your Amazon ECS container instances require at least version 1.9.0 of the container agent. For information about how to check your agent version and updating to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). **Note** You must use either an Amazon ECS-optimized AMI or a custom AMI with at least version `1.9.0-1` of the `ecs-init` package. When using a custom AMI, you must specify that the `awslogs` logging driver is available on the Amazon EC2 instance when you start the agent by using the following environment variable in your **docker run** statement or environment variable file. ``` ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"] ``` Your Amazon ECS container instances also require `logs:CreateLogStream` and `logs:PutLogEvents` permission on the IAM role that you can launch your container instances with. If you created your Amazon ECS container instance role before `awslogs` log driver support was enabled in Amazon ECS, you might need to add this permission. The `ecsTaskExecutionRole` is used when it's assigned to the task and likely contains the correct permissions. For information about the task execution role, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). If your container instances use the managed IAM policy for container instances, your container instances likely have the correct permissions. For information about the managed IAM policy for container instances, see [Amazon ECS container instance IAM role](instance_IAM_role.md). # Example Amazon ECS task definition: Route logs to CloudWatch Before your containers can send logs to CloudWatch, you must specify the `awslogs` log driver for containers in your task definition. For more information about the log parameters, see [Storage and logging](task_definition_parameters.md#container_definition_storage) The task definition JSON that follows has a `logConfiguration` object specified for each container. One is for the WordPress container that sends logs to a log group called `awslogs-wordpress`. The other is for a MySQL container that sends logs to a log group that's called `awslogs-mysql`. Both containers use the `awslogs-example` log stream prefix. ``` { "containerDefinitions": [ { "name": "wordpress", "links": [ "mysql" ], "image": "public.ecr.aws/docker/library/wordpress:latest", "essential": true, "portMappings": [ { "containerPort": 80, "hostPort": 80 } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "awslogs-wordpress", "awslogs-region": "us-west-2", "awslogs-stream-prefix": "awslogs-example" } }, "memory": 500, "cpu": 10 }, { "environment": [ { "name": "MYSQL_ROOT_PASSWORD", "value": "password" } ], "name": "mysql", "image": "public.ecr.aws/docker/library/mysql:latest", "cpu": 10, "memory": 500, "essential": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "awslogs-mysql", "awslogs-region": "us-west-2", "awslogs-stream-prefix": "awslogs-example", "mode": "non-blocking", "max-buffer-size": "25m" } } } ], "family": "awslogs-example" } ``` ## Next steps + You can optionally set a retention policy for the log group by using the CloudWatch AWS CLI or API. For more information, see [put-retention-policy](https://docs.aws.amazon.com/cli/latest/reference/logs/put-retention-policy.html) in the *AWS Command Line Interface Reference*. + After you have registered a task definition with the `awslogs` log driver in a container definition log configuration, you can run a task or create a service with that task definition to start sending logs to CloudWatch Logs. For more information, see [Running an application as an Amazon ECS task](standalone-task-create.md) and [Creating an Amazon ECS rolling update deployment](create-service-console-v2.md). # Send Amazon ECS logs to an AWS service or AWS Partner You can use FireLens for Amazon ECS to use task definition parameters to route logs to an AWS service or AWS Partner Network (APN) destination for log storage and analytics. The AWS Partner Network is a global community of partners that leverages programs, expertise, and resources to build, market, and sell customer offerings. For more information see [AWS Partner](https://aws.amazon.com/partners/work-with-partners/). FireLens works with [Fluentd](https://www.fluentd.org/) and [Fluent Bit](https://fluentbit.io/). We provide the AWS for Fluent Bit image or you can use your own Fluentd or Fluent Bit image. By default, Amazon ECS configures the container dependency so that the Firelens container starts before any container that uses it. The Firelens container also stops after all containers that use it stop. To use this feature, you must create an IAM role for your tasks that provides the permissions necessary to use any AWS services that the tasks require. For example, if a container is routing logs to Firehose, the task requires permission to call the `firehose:PutRecordBatch` API. For more information, see [Adding and Removing IAM Identity Permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) in the *IAM User Guide*. Your task might also require the Amazon ECS task execution role under the following conditions. For more information, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). + If your task is hosted on Fargate and you are pulling container images from Amazon ECR or referencing sensitive data from AWS Secrets Manager in your log configuration, then you must include the task execution IAM role. + When you use a custom configuration file that's hosted in Amazon S3, your task execution IAM role must include the `s3:GetObject` permission. Consider the following when using FireLens for Amazon ECS: + We recommend that you add `my_service_` to the log container name so that you can easily distinguish container names in the console. + Amazon ECS adds a start container order dependency between the application containers and the FireLens container by default. When you specify a container order between the application containers and the FireLens container, then the default start container order is overridden. + FireLens for Amazon ECS is supported for tasks that are hosted on both AWS Fargate on Linux and Amazon EC2 on Linux. Windows containers don't support FireLens. For information about how to configure centralized logging for Windows containers, see [Centralized logging for Windows containers on Amazon ECS using Fluent Bit](https://aws.amazon.com/blogs/containers/centralized-logging-for-windows-containers-on-amazon-ecs-using-fluent-bit/). + You can use CloudFormation templates to configure FireLens for Amazon ECS. For more information, see [AWS::ECS::TaskDefinition FirelensConfiguration](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-firelensconfiguration.html) in the *AWS CloudFormation User Guide* + FireLens listens on port `24224`, so to ensure that the FireLens log router isn't reachable outside of the task, you must not allow inbound traffic on port `24224` in the security group your task uses. For tasks that use the `awsvpc` network mode, this is the security group associated with the task. For tasks using the `host` network mode, this is the security group that's associated with the Amazon EC2 instance hosting the task. For tasks that use the `bridge` network mode, don't create any port mappings that use port `24224`. + For tasks that use the `bridge` network mode, the container with the FireLens configuration must start before any application containers that rely on it start. To control the start order of your containers, use dependency conditions in your task definition. For more information, see [Container dependency](task_definition_parameters.md#container_definition_dependson). **Note** If you use dependency condition parameters in container definitions with a FireLens configuration, ensure that each container has a `START` or `HEALTHY` condition requirement. + By default, FireLens adds the cluster and task definition name and the Amazon Resource Name (ARN) of the cluster as metadata keys to your stdout/stderr container logs. The following is an example of the metadata format. ``` "ecs_cluster": "cluster-name", "ecs_task_arn": "arn:aws:ecs:region:111122223333:task/cluster-name/f2ad7dba413f45ddb4EXAMPLE", "ecs_task_definition": "task-def-name:revision", ``` If you do not want the metadata in your logs, set `enable-ecs-log-metadata` to `false` in the `firelensConfiguration` section of the task definition. ``` "firelensConfiguration":{ "type":"fluentbit", "options":{ "enable-ecs-log-metadata":"false", "config-file-type":"file", "config-file-value":"/extra.conf" } ``` You can configure the FireLens container to run as a non-root user. Consider the following: + To configure the FireLens container to run as a non-root user, you must specify the user in one of the following formats: + `uid` + `uid:gid` + `uid:group` For more information about specifying a user in a container definition, see [ContainerDefinition](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDefinition.html) in the *Amazon Elastic Container Service API Reference*. The FireLens container receives application logs over a UNIX socket. The Amazon ECS agent uses the `uid` to assign ownership of the socket directory to the FireLens container. + Configuring the FireLens container to run as a non-root user is supported on Amazon ECS Agent version `1.96.0` and later, and Amazon ECS-optimized AMI version `v20250716` and later. + When you specify a user for the FireLens container, the `uid` must be unique and not used for other processes belonging to other containers in the task or the container instance. For information about how to use multiple configuration files with Amazon ECS, including files that you host or files in Amazon S3, see [Init process for Fluent Bit on ECS, multi-config support](https://github.com/aws/aws-for-fluent-bit/tree/mainline/use_cases/init-process-for-fluent-bit). For information about example configurations, see [Example Amazon ECS task definition: Route logs to FireLens](firelens-taskdef.md). For more information about configuring logs for high throughput, see [Configuring Amazon ECS logs for high throughput](firelens-docker-buffer-limit.md). # Configuring Amazon ECS logs for high throughput For high log throughput scenarios, we recommend using the `awsfirelens` log driver with FireLens and Fluent Bit. Fluent Bit is a lightweight log processor that's efficient with resources and can handle millions of log records. However, achieving optimal performance at scale requires tuning its configuration. This section covers advanced Fluent Bit optimization techniques for handling high log throughput while maintaining system stability and ensuring no data loss. For information about how to use custom configuration files with FireLens, see [Use a custom configuration file](firelens-taskdef.md#firelens-taskdef-customconfig). For additional examples, see [Amazon ECS FireLens examples](https://github.com/aws-samples/amazon-ecs-firelens-examples) on GitHub. **Note** Some configuration options in this section, such as `workers` and `threaded`, require AWS for Fluent Bit version 3 or later. For information about available versions, see [AWS for Fluent Bit releases](https://github.com/aws/aws-for-fluent-bit/releases). ## Understanding chunks Fluent Bit processes data in units called *chunks*. When an INPUT plugin receives data, the engine creates a chunk that gets stored in memory or on the filesystem before being sent to OUTPUT destinations. Buffering behavior depends on the `storage.type` setting in your INPUT sections. By default, Fluent Bit uses memory buffering. For high-throughput or production scenarios, filesystem buffering provides better resilience. For more information, see [Chunks](https://docs.fluentbit.io/manual/administration/buffering-and-storage#chunks) in the Fluent Bit documentation and [What is a Chunk?](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention#what-is-a-chunk) in the AWS for Fluent Bit examples repository. ## Memory buffering (default) By default, Fluent Bit uses memory buffering (`storage.type memory`). You can limit memory usage per INPUT plugin using the `Mem_Buf_Limit` parameter. The following example shows a memory-buffered input configuration: ``` [INPUT] Name tcp Tag ApplicationLogs Port 5170 storage.type memory Mem_Buf_Limit 5MB ``` **Important** When `Mem_Buf_Limit` is exceeded for a plugin, Fluent Bit pauses the input and new records are lost. This can cause backpressure and slow down your application. The following warning appears in the Fluent Bit logs: ``` [input] tcp.1 paused (mem buf overlimit) ``` Memory buffering is suitable for simple use cases with low to moderate log throughput. For high-throughput or production scenarios where data loss is a concern, use filesystem buffering instead. For more information, see [Buffering and Memory](https://docs.fluentbit.io/manual/administration/buffering-and-storage#buffering-and-memory) in the Fluent Bit documentation and [Memory Buffering Only](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention#case-1-memory-buffering-only-default-or-storagetype-memory) in the AWS for Fluent Bit examples repository. ## Filesystem buffering For high-throughput scenarios, we recommend using filesystem buffering. For more information about how Fluent Bit manages buffering and storage, see [Buffering and Storage](https://docs.fluentbit.io/manual/administration/buffering-and-storage) in the Fluent Bit documentation. Filesystem buffering provides the following advantages: + **Larger buffer capacity** – Disk space is typically more abundant than memory. + **Persistence** – Buffered data survives Fluent Bit restarts. + **Graceful degradation** – During output failures, data accumulates on disk rather than causing memory exhaustion. To enable filesystem buffering, provide a custom Fluent Bit configuration file. The following example shows the recommended configuration: ``` [SERVICE] # Flush logs every 1 second Flush 1 # Wait 120 seconds during shutdown to flush remaining logs Grace 120 # Directory for filesystem buffering storage.path /var/log/flb-storage/ # Limit chunks stored 'up' in memory (reduce for memory-constrained environments) storage.max_chunks_up 32 # Flush backlog chunks to destinations during shutdown (prevents log loss) storage.backlog.flush_on_shutdown On [INPUT] Name forward unix_path /var/run/fluent.sock # Run input in separate thread to prevent blocking threaded true # Enable filesystem buffering for persistence storage.type filesystem [OUTPUT] Name cloudwatch_logs Match * region us-west-2 log_group_name /aws/ecs/my-app log_stream_name $(ecs_task_id) # Use multiple workers for parallel processing workers 2 # Retry failed flushes up to 15 times retry_limit 15 # Maximum disk space for buffered data for this output storage.total_limit_size 10G ``` Key configuration parameters: `storage.path` The directory where Fluent Bit stores buffered chunks on disk. `storage.backlog.flush_on_shutdown` When enabled, Fluent Bit attempts to flush all backlog filesystem chunks to their destinations during shutdown. This helps ensure data delivery before Fluent Bit stops, but may increase shutdown time. `storage.max_chunks_up` The number of chunks that remain in memory. The default is 128 chunks, which can consume 500 MB\$1 of memory because each chunk can use up to 4–5 MB. In memory-constrained environments, lower this value. For example, if you have 50 MB available for buffering, set this to 8–10 chunks. `storage.type filesystem` Enables filesystem storage for the input plugin. Despite the name, Fluent Bit uses `mmap` to map chunks to both memory and disk, providing persistence without sacrificing performance. `storage.total_limit_size` The maximum disk space for buffered data for a specific OUTPUT plugin. When this limit is reached, the oldest records for that output are dropped. For more information about sizing, see [Understanding `storage.total_limit_size`](#firelens-storage-sizing). `threaded true` Runs the input in its own thread, separate from Fluent Bit's main event loop. This prevents slow inputs from blocking the entire pipeline. For more information, see [Filesystem Buffering](https://docs.fluentbit.io/manual/administration/buffering-and-storage#filesystem-buffering) in the Fluent Bit documentation and [Filesystem and Memory Buffering](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention#case-2-filesystem-and-memory-buffering-storagetype-filesystem) in the AWS for Fluent Bit examples repository. ## Understanding `storage.total_limit_size` The `storage.total_limit_size` parameter on each OUTPUT plugin controls the maximum disk space for buffered data for that output. When this limit is reached, the oldest records for that output are dropped to make room for new data. When disk space is completely exhausted, Fluent Bit fails to queue records and they are lost. Use the following formula to calculate the appropriate `storage.total_limit_size` based on your log rate and desired recovery window: ``` If log rate is in KB/s, convert to MB/s first: log_rate (MB/s) = log_rate (KB/s) / 1000 storage.total_limit_size (GB) = log_rate (MB/s) × duration (hours) × 3600 (seconds/hour) / 1000 (MB to GB) ``` The following table shows example calculations for common log rates and recovery windows: | Log Rate | 1 hour | 6 hours | 12 hours | 24 hours | | --- | --- | --- | --- | --- | | 0.25 MB/s | 0.9 GB | 5.4 GB | 10.8 GB | 21.6 GB | | 0.5 MB/s | 1.8 GB | 10.8 GB | 21.6 GB | 43.2 GB | | 1 MB/s | 3.6 GB | 21.6 GB | 43.2 GB | 86.4 GB | | 5 MB/s | 18 GB | 108 GB | 216 GB | 432 GB | | 10 MB/s | 36 GB | 216 GB | 432 GB | 864 GB | To observe peak throughput and choose appropriate buffer sizes, use the [measure-throughput FireLens sample](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/measure-throughput). Use the formula, example calculations, and benchmarking to choose a suitable `storage.total_limit_size` that provides runway for best-effort recovery during an outage. ## Amazon ECS task storage requirements Sum all `storage.total_limit_size` values across OUTPUT sections and add buffer for overhead. This total determines the storage space needed in your Amazon ECS task definition. For example, 3 outputs × 10 GB each = 30 GB \$1 buffer (5–10 GB) = 35–40 GB total required. If the total exceeds available storage, Fluent Bit may fail to queue records and they will be lost. The following storage options are available: Bind mounts (ephemeral storage) + For AWS Fargate, the default is 20 GB of ephemeral storage (max 200 GB). Configure using `ephemeralStorage` in the task definition. For more information, see [EphemeralStorage](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-ephemeralstorage.html) in the *AWS CloudFormation User Guide*. + For EC2, the default is 30 GB when using the Amazon ECS-optimized AMI (shared between the OS and Docker). Increase by changing the root volume size. Amazon EBS volumes + Provides highly available, durable, high-performance block storage. + Requires volume configuration and `mountPoint` in the task definition pointing to `storage.path` (default: `/var/log/flb-storage/`). + For more information, see [Defer volume configuration to launch time in an Amazon ECS task definition](specify-ebs-config.md). Amazon EFS volumes + Provides simple, scalable file storage. + Requires volume configuration and `mountPoint` in the task definition pointing to `storage.path` (default: `/var/log/flb-storage/`). + For more information, see [Specify an Amazon EFS file system in an Amazon ECS task definition](specify-efs-config.md). For more information about data volumes, see [Storage options for Amazon ECS tasks](using_data_volumes.md). ## Optimize output configuration Network issues, service outages, and destination throttling can prevent logs from being delivered. Proper output configuration ensures resilience without data loss. When an output flush fails, Fluent Bit can retry the operation. The following parameters control retry behavior: `retry_limit` The maximum number of retries after the initial attempt before dropping records. The default is 1. For example, `retry_limit 3` means 4 total attempts (1 initial \$1 3 retries). For production environments, we recommend 15 or higher, which covers several minutes of outage with exponential backoff. Set to `no_limits` or `False` for infinite retries: + With memory buffering, infinite retries cause the input plugin to pause when memory limits are reached. + With filesystem buffering, the oldest records are dropped when `storage.total_limit_size` is reached. After exhausting all retry attempts (1 initial \$1 `retry_limit` retries), records are dropped. AWS plugins with `auto_retry_requests true` (default) provide an additional retry layer before Fluent Bit's retry mechanism. For more information, see [Configure retries](https://docs.fluentbit.io/manual/administration/scheduling-and-retries#configure-retries) in the Fluent Bit documentation. For example, `retry_limit 3` with default settings (`scheduler.base 5`, `scheduler.cap 2000`, `net.connect_timeout 10s`) provides approximately 70 seconds of scheduler wait time (10s \$1 20s \$1 40s), 40 seconds of network connect timeouts (4 attempts × 10s), plus AWS plugin retries — totaling approximately 2–10 minutes depending on network conditions and OS TCP timeouts. `scheduler.base` The base seconds between retries (default: 5). We recommend 10 seconds. `scheduler.cap` The maximum seconds between retries (default: 2000). We recommend 60 seconds. Wait time between retries uses exponential backoff with jitter: ``` wait_time = random(base, min(base × 2^retry_number, cap)) ``` For example, with `scheduler.base 10` and `scheduler.cap 60`: + First retry: random wait between 10–20 seconds + Second retry: random wait between 10–40 seconds + Third retry and later: random wait between 10–60 seconds (capped) For more information, see [Configure wait time for retry](https://docs.fluentbit.io/manual/administration/scheduling-and-retries#configure-wait-time-for-retry) and [Networking](https://docs.fluentbit.io/manual/administration/networking) in the Fluent Bit documentation. `workers` The number of threads for parallel output processing. Multiple workers allow concurrent flushes, improving throughput when processing many chunks. `auto_retry_requests` An AWS plugin-specific setting that provides an additional retry layer before Fluent Bit's built-in retry mechanism. The default is `true`. When enabled, the AWS output plugin retries failed requests internally before the request is considered a failed flush and subject to the `retry_limit` configuration. The `Grace` parameter in the `[SERVICE]` section sets the time Fluent Bit waits during shutdown to flush buffered data. The `Grace` period must be coordinated with the container's `stopTimeout`. Ensure that `stopTimeout` exceeds the `Grace` period to allow Fluent Bit to complete flushing before receiving `SIGKILL`. For example, if `Grace` is 120 seconds, set `stopTimeout` to 150 seconds. The following example shows a complete Fluent Bit configuration with all recommended settings for high-throughput scenarios: ``` [SERVICE] # Flush logs every 1 second Flush 1 # Wait 120 seconds during shutdown to flush remaining logs Grace 120 # Directory for filesystem buffering storage.path /var/log/flb-storage/ # Limit chunks stored 'up' in memory (reduce for memory-constrained environments) storage.max_chunks_up 32 # Flush backlog chunks to destinations during shutdown (prevents log loss) storage.backlog.flush_on_shutdown On # Minimum seconds between retries scheduler.base 10 # Maximum seconds between retries (exponential backoff cap) scheduler.cap 60 [INPUT] Name forward unix_path /var/run/fluent.sock # Run input in separate thread to prevent blocking threaded true # Enable filesystem buffering for persistence storage.type filesystem [OUTPUT] Name cloudwatch_logs Match * region us-west-2 log_group_name /aws/ecs/my-app log_stream_name $(ecs_task_id) # Use multiple workers for parallel processing workers 2 # Retry failed flushes up to 15 times retry_limit 15 # Maximum disk space for buffered data for this output storage.total_limit_size 10G ``` ## Understanding data loss scenarios Records can be lost during extended outages or issues with output destinations. The configuration recommendations in this guide are best-effort approaches to minimize data loss, but cannot guarantee zero loss during prolonged failures. Understanding these scenarios helps you configure Fluent Bit to maximize resilience. Records can be lost in two ways: oldest records are dropped when storage fills up, or newest records are rejected when the system cannot accept more data. ### Oldest records dropped The oldest buffered records are dropped when retry attempts are exhausted or when `storage.total_limit_size` fills up and needs to make room for new data. Retry limit exceeded Occurs after AWS plugin retries (if `auto_retry_requests true`) plus 1 initial Fluent Bit attempt plus `retry_limit` retries. To mitigate, set `retry_limit no_limits` per OUTPUT plugin for infinite retries: ``` [OUTPUT] Name cloudwatch_logs Match ApplicationLogs retry_limit no_limits auto_retry_requests true ``` Infinite retries prevent dropping records due to retry exhaustion, but may cause `storage.total_limit_size` to fill up. Storage limit reached (filesystem buffering) Occurs when the output destination is unavailable longer than your configured `storage.total_limit_size` can buffer. For example, a 10 GB buffer at 1 MB/s log rate provides approximately 2.7 hours of buffering. To mitigate, increase `storage.total_limit_size` per OUTPUT plugin and provision adequate Amazon ECS task storage: ``` [OUTPUT] Name cloudwatch_logs Match ApplicationLogs storage.total_limit_size 10G ``` ### Newest records rejected The newest records are dropped when disk space is exhausted or when input is paused due to `Mem_Buf_Limit`. Disk space exhausted (filesystem buffering) Occurs when disk space is completely exhausted. Fluent Bit fails to queue new records and they are lost. To mitigate, sum all `storage.total_limit_size` values and provision adequate Amazon ECS task storage. For more information, see [Amazon ECS task storage requirements](#firelens-storage-task-requirements). Memory limit reached (memory buffering) Occurs when the output destination is unavailable and the memory buffer fills. Paused input plugins stop accepting new records. To mitigate, use `storage.type filesystem` for better resilience, or increase `Mem_Buf_Limit`. ### Best practices to minimize data loss Consider the following best practices to minimize data loss: + **Use filesystem buffering** – Set `storage.type filesystem` for better resilience during outages. + **Size storage appropriately** – Calculate `storage.total_limit_size` based on log rate and desired recovery window. + **Provision adequate disk** – Ensure the Amazon ECS task has sufficient ephemeral storage, Amazon EBS, or Amazon EFS. + **Configure retry behavior** – Balance between `retry_limit` (drops records after exhausting retries) and `no_limits` (retries indefinitely but may fill storage). ## Use multi-destination logging for reliability Sending logs to multiple destinations eliminates single points of failure. For example, if CloudWatch Logs experiences an outage, logs still reach Amazon S3. Multi-destination logging provides the following benefits. The Amazon S3 output plugin also supports compression options such as gzip and Parquet format, which can reduce storage costs. For more information, see [S3 compression](https://docs.fluentbit.io/manual/pipeline/outputs/s3#compression) in the Fluent Bit documentation. Multi-destination logging can provide the following benefits: + **Redundancy** – If one destination fails, logs still reach the other. + **Recovery** – Reconstruct gaps in one system from the other. + **Durability** – Archive logs in Amazon S3 for long-term retention. + **Cost optimization** – Keep recent logs in a fast query service like CloudWatch Logs with shorter retention, while archiving all logs to lower-cost Amazon S3 storage for long-term retention. The following Fluent Bit configuration sends logs to both CloudWatch Logs and Amazon S3: ``` [OUTPUT] Name cloudwatch_logs Match * region us-west-2 log_group_name /aws/ecs/my-app log_stream_name $(ecs_task_id) workers 2 retry_limit 15 [OUTPUT] Name s3 Match * bucket my-logs-bucket region us-west-2 total_file_size 100M s3_key_format /fluent-bit-logs/$(ecs_task_id)/%Y%m%d/%H/%M/$UUID upload_timeout 10m # Maximum disk space for buffered data for this output storage.total_limit_size 5G ``` Both outputs use the same `Match *` pattern, so all records are sent to both destinations independently. During an outage of one destination, logs continue flowing to the other while failed flushes accumulate in the filesystem buffer for later retry. ## Use file-based logging with the tail input plugin For high-throughput scenarios where log loss is a critical concern, you can use an alternative approach: have your application write logs to files on disk, and configure Fluent Bit to read them using the `tail` input plugin. This approach bypasses the Docker logging driver layer entirely. File-based logging with the tail plugin provides the following benefits: + **Offset tracking** – The tail plugin can store file offsets in a database file (using the `DB` option), providing durability across Fluent Bit restarts. This helps prevent log loss during container restarts. + **Input-level buffering** – You can configure memory buffer limits directly on the input plugin using `Mem_Buf_Limit`, providing more granular control over memory usage. + **Avoids Docker overhead** – Logs go directly from file to Fluent Bit without passing through Docker's log buffers. To use this approach, your application must write logs to files instead of `stdout`. Both the application container and the Fluent Bit container mount a shared volume where the log files are stored. The following example shows a tail input configuration with best practices: ``` [INPUT] Name tail # File path or glob pattern to tail Path /var/log/app.log # Database file for storing file offsets (enables resuming after restart) DB /var/log/flb_tail.db # when true, controls that only fluent-bit will access the database (improves performance) DB.locking true # Skip long lines instead of skipping the entire file Skip_Long_Lines On # How often (in seconds) to check for new files matching the glob pattern Refresh_Interval 10 # Extra seconds to monitor a file after rotation to account for pending flush Rotate_Wait 30 # Maximum size of the buffer for a single line Buffer_Max_Size 10MB # Initial allocation size for reading file data Buffer_Chunk_Size 1MB # Maximum memory buffer size (tail pauses when full) Mem_Buf_Limit 75MB ``` When using the tail input plugin, consider the following: + Implement log rotation for your application logs to prevent disk exhaustion. Monitor the underlying volume metrics to gauge performance. + Consider settings like `Ignore_Older`, `Read_from_Head`, and multiline parsers based on your log format. For more information, see [Tail](https://docs.fluentbit.io/manual/pipeline/inputs/tail) in the Fluent Bit documentation. For best practices, see [Tail config with best practices](https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#tail-config-with-best-practices) in the AWS for Fluent Bit troubleshooting guide. ## Log directly to FireLens When the `awsfirelens` log driver is specified in a task definition, the Amazon ECS container agent injects the following environment variables into the container: `FLUENT_HOST` The IP address that's assigned to the FireLens container. If you're using EC2 with the `bridge` network mode, the `FLUENT_HOST` environment variable in your application container can become inaccurate after a restart of the FireLens log router container (the container with the `firelensConfiguration` object in its container definition). This is because `FLUENT_HOST` is a dynamic IP address and can change after a restart. Logging directly from the application container to the `FLUENT_HOST` IP address can start failing after the address changes. For more information about restarting individual containers, see [Restart individual containers in Amazon ECS tasks with container restart policies](container-restart-policy.md). `FLUENT_PORT` The port that the Fluent Forward protocol is listening on. You can use these environment variables to log directly to the Fluent Bit log router from your application code using the Fluent Forward protocol, instead of writing to `stdout`. This approach bypasses the Docker logging driver layer, which provides the following benefits: + **Lower latency** – Logs go directly to Fluent Bit without passing through Docker's logging infrastructure. + **Structured logging** – Send structured log data natively without JSON encoding overhead. + **Better control** – Your application can implement its own buffering and error handling logic. The following Fluent logger libraries support the Fluent Forward protocol and can be used to send logs directly to Fluent Bit: + **Go** – [fluent-logger-golang](https://github.com/fluent/fluent-logger-golang) + **Python** – [fluent-logger-python](https://github.com/fluent/fluent-logger-python) + **Java** – [fluent-logger-java](https://github.com/fluent/fluent-logger-java) + **Node.js** – [fluent-logger-node](https://github.com/fluent/fluent-logger-node) + **Ruby** – [fluent-logger-ruby](https://github.com/fluent/fluent-logger-ruby) ## Configure the Docker buffer limit When you create a task definition, you can specify the number of log lines that are buffered in memory by specifying the value in `log-driver-buffer-limit`. This controls the buffer between Docker and Fluent Bit. For more information, see [Fluentd logging driver](https://docs.docker.com/engine/logging/drivers/fluentd/) in the Docker documentation. Use this option when there's high throughput, because Docker might run out of buffer memory and discard buffer messages so it can add new messages. Consider the following when using this option: + This option is supported on EC2 and Fargate type with platform version `1.4.0` or later. + The option is only valid when `logDriver` is set to `awsfirelens`. + The default buffer limit is `1048576` log lines. + The buffer limit must be greater than or equal to `0` and less than `536870912` log lines. + The maximum amount of memory used for this buffer is the product of the size of each log line and the size of the buffer. For example, if the application's log lines are on average `2` KiB, a buffer limit of 4096 would use at most `8` MiB. The total amount of memory allocated at the task level should be greater than the amount of memory that's allocated for all the containers in addition to the log driver memory buffer. The following task definition shows how to configure `log-driver-buffer-limit`: ``` { "containerDefinitions": [ { "name": "my_service_log_router", "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:3", "cpu": 0, "memoryReservation": 51, "essential": true, "firelensConfiguration": { "type": "fluentbit" } }, { "essential": true, "image": "public.ecr.aws/docker/library/httpd:latest", "name": "app", "logConfiguration": { "logDriver": "awsfirelens", "options": { "Name": "firehose", "region": "us-west-2", "delivery_stream": "my-stream", "log-driver-buffer-limit": "52428800" } }, "dependsOn": [ { "containerName": "my_service_log_router", "condition": "START" } ], "memoryReservation": 100 } ] } ``` # AWS for Fluent Bit image repositories for Amazon ECS AWS provides a Fluent Bit image with plugins for both CloudWatch Logs and Firehose. We recommend using Fluent Bit as your log router because it has a lower resource utilization rate than Fluentd. For more information, see [CloudWatch Logs for Fluent Bit](https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit) and [Amazon Kinesis Firehose for Fluent Bit](https://github.com/aws/amazon-kinesis-firehose-for-fluent-bit). The **AWS for Fluent Bit** image is available on Amazon ECR on both the Amazon ECR Public Gallery and in an Amazon ECR repository for high availability. ## Amazon ECR Public Gallery The AWS for Fluent Bit image is available on the Amazon ECR Public Gallery. This is the recommended location to download the AWS for Fluent Bit image because it's a public repository and available to be used from all AWS Regions. For more information, see [aws-for-fluent-bit](https://gallery.ecr.aws/aws-observability/aws-for-fluent-bit) on the Amazon ECR Public Gallery. ### Linux The AWS for Fluent Bit image in the Amazon ECR Public Gallery supports the Amazon Linux operating system with the `ARM64` or `x86-64` architecture. You can pull the AWS for Fluent Bit image from the Amazon ECR Public Gallery by specifying the repository URL with the desired image tag. The available image tags can be found on the **Image tags** tab on the Amazon ECR Public Gallery. The following shows the syntax to use for the Docker CLI. ``` docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:tag ``` For example, you can pull the latest image in the "3.x" family of AWS for Fluent Bit releases using this Docker CLI command. ``` docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:3 ``` **Note** Unauthenticated pulls are allowed, but have a lower rate limit than authenticated pulls. To authenticate using your AWS account before pulling, use the following command. ``` aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws ``` #### AWS for Fluent Bit 3.0.0 In addition to the existing AWS for Fluent Bit versions `2.x`, AWS for Fluent Bit supports a new major version `3.x`. The new major version includes upgrading images from Amazon Linux 2 to Amazon Linux 2023 and Fluent Bit version `1.9.10` to `4.1.1`. For more information, see the [AWS for Fluent Bit repository](https://github.com/aws/aws-for-fluent-bit/blob/mainline/VERSIONS.md) on GitHub. The following examples demonstrate updated tags for AWS for Fluent Bit `3.x` images: You can use multi-architecture tags for the AWS for Fluent Bit image. ``` docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:3 ``` ### Windows The AWS for Fluent Bit image in the Amazon ECR Public Gallery supports the `AMD64` architecture with the following operating systems: + Windows Server 2022 Full + Windows Server 2022 Core + Windows Server 2019 Full + Windows Server 2019 Core Windows containers that are on AWS Fargate don't support FireLens. You can pull the AWS for Fluent Bit image from the Amazon ECR Public Gallery by specifying the repository URL with the desired image tag. The available image tags can be found on the **Image tags** tab on the Amazon ECR Public Gallery. The following shows the syntax to use for the Docker CLI. ``` docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:tag ``` For example, you can pull the newest stable AWS for Fluent Bit image using this Docker CLI command. ``` docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:windowsservercore-stable ``` **Note** Unauthenticated pulls are allowed, but have a lower rate limit than authenticated pulls. To authenticate using your AWS account before pulling, use the following command. ``` aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws ``` ## Amazon ECR The AWS for Fluent Bit image is available on Amazon ECR for high availability. The following commands can be used to retreive image URIs and establish image availability in a given AWS Region. ### Linux The latest stable AWS for Fluent Bit image URI can be retrieved using the following command. ``` aws ssm get-parameters \ --names /aws/service/aws-for-fluent-bit/stable \ --region us-east-1 ``` All versions of the AWS for Fluent Bit image can be listed using the following command to query the Systems Manager Parameter Store parameter. ``` aws ssm get-parameters-by-path \ --path /aws/service/aws-for-fluent-bit \ --region us-east-1 ``` The newest stable AWS for Fluent Bit image can be referenced in an CloudFormation template by referencing the Systems Manager parameter store name. The following is an example: ``` Parameters: FireLensImage: Description: Fluent Bit image for the FireLens Container Type: AWS::SSM::Parameter::Value Default: /aws/service/aws-for-fluent-bit/stable ``` **Note** If the command fails or there is no output, the image isn't available in the AWS Region in which the command is called. ### Windows The latest stable AWS for Fluent Bit image URI can be retrieved using the following command. ``` aws ssm get-parameters \ --names /aws/service/aws-for-fluent-bit/windowsservercore-stable \ --region us-east-1 ``` All versions of the AWS for Fluent Bit image can be listed using the following command to query the Systems Manager Parameter Store parameter. ``` aws ssm get-parameters-by-path \ --path /aws/service/aws-for-fluent-bit/windowsservercore \ --region us-east-1 ``` The latest stable AWS for Fluent Bit image can be referenced in an CloudFormation template by referencing the Systems Manager parameter store name. The following is an example: ``` Parameters: FireLensImage: Description: Fluent Bit image for the FireLens Container Type: AWS::SSM::Parameter::Value Default: /aws/service/aws-for-fluent-bit/windowsservercore-stable ``` # Example Amazon ECS task definition: Route logs to FireLens To use custom log routing with FireLens, you must specify the following in your task definition: + A log router container that contains a FireLens configuration. We recommend that the container be marked as `essential`. + One or more application containers that contain a log configuration specifying the `awsfirelens` log driver. + A task IAM role Amazon Resource Name (ARN) that contains the permissions needed for the task to route the logs. When creating a new task definition using the AWS Management Console, there is a FireLens integration section that makes it easy to add a log router container. For more information, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). Amazon ECS converts the log configuration and generates the Fluentd or Fluent Bit output configuration. The output configuration is mounted in the log routing container at `/fluent-bit/etc/fluent-bit.conf` for Fluent Bit and `/fluentd/etc/fluent.conf` for Fluentd. **Important** FireLens listens on port `24224`. Therefore, to ensure that the FireLens log router isn't reachable outside of the task, you must not allow ingress traffic on port `24224` in the security group your task uses. For tasks that use the `awsvpc` network mode, this is the security group that's associated with the task. For tasks that use the `host` network mode, this is the security group that's associated with the Amazon EC2 instance hosting the task. For tasks that use the `bridge` network mode, don't create any port mappings that use port `24224`. By default, Amazon ECS adds additional fields in your log entries that help identify the source of the logs. + `ecs_cluster` – The name of the cluster that the task is part of. + `ecs_task_arn` – The full Amazon Resource Name (ARN) of the task that the container is part of. + `ecs_task_definition` – The task definition name and revision that the task is using. + `ec2_instance_id` – The Amazon EC2 instance ID that the container is hosted on. This field is only valid for tasks using the EC2 launch type. You can set the `enable-ecs-log-metadata` to `false` if you do not want the metadata. The following task definition example defines a log router container that uses Fluent Bit to route its logs to CloudWatch Logs. It also defines an application container that uses a log configuration to route logs to Amazon Data Firehose and sets the memory that's used to buffer events to the 2 MiB. **Note** For more example task definitions, see [Amazon ECS FireLens examples](https://github.com/aws-samples/amazon-ecs-firelens-examples) on GitHub. ``` { "family": "firelens-example-firehose", "taskRoleArn": "arn:aws:iam::123456789012:role/ecs_task_iam_role", "containerDefinitions": [ { "name": "log_router", "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:3", "cpu": 0, "memoryReservation": 51, "portMappings": [], "essential": true, "environment": [], "mountPoints": [], "volumesFrom": [], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/ecs-aws-firelens-sidecar-container", "mode": "non-blocking", "awslogs-create-group": "true", "max-buffer-size": "25m", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "firelens" }, "secretOptions": [] }, "systemControls": [], "firelensConfiguration": { "type": "fluentbit" } }, { "essential": true, "image": "public.ecr.aws/docker/library/httpd:latest", "name": "app", "logConfiguration": { "logDriver": "awsfirelens", "options": { "Name": "firehose", "region": "us-west-2", "delivery_stream": "my-stream", "log-driver-buffer-limit": "1048576" } }, "memoryReservation": 100 } ] } ``` The key-value pairs specified as options in the `logConfiguration` object are used to generate the Fluentd or Fluent Bit output configuration. The following is a code example from a Fluent Bit output definition. ``` [OUTPUT] Name firehose Match app-firelens* region us-west-2 delivery_stream my-stream ``` **Note** FireLens manages the `match` configuration. You do not specify the `match` configuration in your task definition. ## Use a custom configuration file You can specify a custom configuration file. The configuration file format is the native format for the log router that you're using. For more information, see [Fluentd Config File Syntax](https://docs.fluentd.org/configuration/config-file) and [YAML Configuration](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/yaml). In your custom configuration file, for tasks using the `bridge` or `awsvpc` network mode, don't set a Fluentd or Fluent Bit forward input over TCP because FireLens adds it to the input configuration. Your FireLens configuration must contain the following options to specify a custom configuration file: `config-file-type` The source location of the custom configuration file. The available options are `s3` or `file`. Tasks that are hosted on AWS Fargate only support the `file` configuration file type. However, you can use configuration files hosted in Amazon S3 on AWS Fargate by using the AWS for Fluent Bit init container. For more information, see [Init process for Fluent Bit on ECS, multi-config support](https://github.com/aws/aws-for-fluent-bit/blob/mainline/use_cases/init-process-for-fluent-bit/README.md) on GitHub. `config-file-value` The source for the custom configuration file. If the `s3` config file type is used, the config file value is the full ARN of the Amazon S3 bucket and file. If the `file` config file type is used, the config file value is the full path of the configuration file that exists either in the container image or on a volume that's mounted in the container. When using a custom configuration file, you must specify a different path than the one FireLens uses. Amazon ECS reserves the `/fluent-bit/etc/fluent-bit.conf` filepath for Fluent Bit and `/fluentd/etc/fluent.conf` for Fluentd. The following example shows the syntax required when specifying a custom configuration. **Important** To specify a custom configuration file that's hosted in Amazon S3, ensure you have created a task execution IAM role with the proper permissions. The following shows the syntax required when specifying a custom configuration. ``` { "containerDefinitions": [ { "essential": true, "image": "906394416424.dkr.ecr.us-west-2.amazonaws.com/aws-for-fluent-bit:3", "name": "log_router", "firelensConfiguration": { "type": "fluentbit", "options": { "config-file-type": "s3 | file", "config-file-value": "arn:aws:s3:::amzn-s3-demo-bucket/fluent.conf | filepath" } } } ] } ``` **Note** Tasks hosted on AWS Fargate only support the `file` configuration file type. However, you can use configuration files hosted in Amazon S3 on AWS Fargate by using the AWS for Fluent Bit init container. For more information, see [Init process for Fluent Bit on ECS, multi-config support](https://github.com/aws/aws-for-fluent-bit/blob/mainline/use_cases/init-process-for-fluent-bit/README.md) on GitHub. # Using non-AWS container images in Amazon ECS Use private registry to store your credentials in AWS Secrets Manager, and then reference them in your task definition. This provides a way to reference container images that exist in private registries outside of AWS that require authentication in your task definitions. This feature is supported by tasks hosted on Fargate, Amazon EC2 instances, and external instances using Amazon ECS Anywhere. **Important** If your task definition references an image that's stored in Amazon ECR, this topic doesn't apply. For more information, see [Using Amazon ECR Images with Amazon ECS](https://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_on_ECS.html) in the *Amazon Elastic Container Registry User Guide*. For tasks hosted on Amazon EC2 instances, this feature requires version `1.19.0` or later of the container agent. However, we recommend using the latest container agent version. For information about how to check your agent version and update to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). For tasks hosted on Fargate, this feature requires platform version `1.2.0` or later. For information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). Within your container definition, specify the `repositoryCredentials` object with the details of the secret that you created. The referenced secret can be from a different AWS Region or a different account than the task using it. **Note** When using the Amazon ECS API, AWS CLI, or AWS SDK, if the secret exists in the same AWS Region as the task that you're launching then you can use either the full ARN or name of the secret. If the secret exists in a different account, the full ARN of the secret must be specified. When using the AWS Management Console, the full ARN of the secret must be specified always. The following is a snippet of a task definition that shows the required parameters: Substitute the following parameters: + *private-repo* with the private repository host name + *private-image* with the image name + *arn:aws:secretsmanager:region:aws\$1account\$1id:secret:secret\$1name* with the secret Amazon Resource Name (ARN) ``` "containerDefinitions": [ { "image": "private-repo/private-image", "repositoryCredentials": { "credentialsParameter": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name" } } ] ``` **Note** Another method of enabling private registry authentication uses Amazon ECS container agent environment variables to authenticate to private registries. This method is only supported for tasks hosted on Amazon EC2 instances. For more information, see [Configuring Amazon ECS container instances for private Docker images](private-auth-container-instances.md). **To use private registry** 1. The task definition must have a task execution role. This allows the container agent to pull the container image. For more information, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). Private registry authentication allows your Amazon ECS tasks to pull container images from private registries outside of AWS (such as Docker Hub, Quay.io, or your own private registry) that require authentication credentials. This feature uses Secrets Manager to securely store your registry credentials, which are then referenced in your task definition using the `repositoryCredentials` parameter. For more information about configuring private registry authentication, see [Using non-AWS container images in Amazon ECS](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/private-auth.html). To provide access to the secrets that contain your private registry credentials, add the following permissions as an inline policy to the task execution role. For more information, see [Adding and Removing IAM Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html). + `secretsmanager:GetSecretValue`—Required to retrieve the private registry credentials from Secrets Manager. + `kms:Decrypt`—Required only if your secret uses a custom KMS key and not the default key. The Amazon Resource Name (ARN) for your custom key must be added as a resource. The following is an example inline policy that adds the permissions. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kms:Decrypt", "secretsmanager:GetSecretValue" ], "Resource": [ "arn:aws:secretsmanager:us-east-1:111122223333:secret:secret_name", "arn:aws:kms:us-east-1:111122223333:key/key_id" ] } ] } ``` ------ 1. Use AWS Secrets Manager to create a secret for your private registry credentials. For information about how to create a secret, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) in the *AWS Secrets Manager User Guide*. Enter your private registry credentials using the following format: ``` { "username" : "privateRegistryUsername", "password" : "privateRegistryPassword" } ``` 1. Register a task definition. For more information, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). # Restart individual containers in Amazon ECS tasks with container restart policies You can enable a restart policy for each essential and non-essential container defined in your task definition, to overcome transient failures faster and maintain task availability. When you enable a restart policy for a container, Amazon ECS can restart the container if it exits, without needing to replace the task. Restart policies are not enabled for containers by default. When you enable a restart policy for a container, you can specify exit codes that the container will not be restarted on. These can be exit codes that indicate success, like exit code `0`, that don't require a restart. You can also specify how long a container must run succesfully before a restart can be attempted. For more information about these parameters, see [Restart policy](task_definition_parameters.md#container_definition_restart_policy). For an example task definition that specifies these values, see [Specifying a container restart policy in an Amazon ECS task definition](container-restart-policy-example.md). You can use the Amazon ECS task metadata endpoint or CloudWatch Container Insights to monitor the number of times a container has restarted. For more information about the task metadata endpoint, see [Amazon ECS task metadata endpoint version 4](task-metadata-endpoint-v4.md) and [Amazon ECS task metadata endpoint version 4 for tasks on Fargate](task-metadata-endpoint-v4-fargate.md). For more information about Container Insights metrics for Amazon ECS, see [Amazon ECS Container Insights metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html) in the *Amazon CloudWatch User Guide*. Container restart policies are supported by tasks hosted on Fargate, Amazon EC2 instances, and external instances using Amazon ECS Anywhere. ## Considerations Consider the following before enabling a restart policy for your container: + Restart policies aren't supported for Windows containers on Fargate. + For tasks hosted on Amazon EC2 instances, this feature requires version `1.86.0` or later of the container agent. However, we recommend using the latest container agent version. For information about how to check your agent version and update to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). + If you're using EC2 with the `bridge` network mode, the `FLUENT_HOST` environment variable in your application container can become inaccurate after a restart of the FireLens log router container (the container with the `firelensConfiguration` object in its container definition). This is because `FLUENT_HOST` is a dynamic IP address and can change after a restart. Logging directly from the application container to the `FLUENT_HOST` IP address can start failing after the address changes. For more information about `FLUENT_HOST`, see [Configuring Amazon ECS logs for high throughput](firelens-docker-buffer-limit.md). + The Amazon ECS agent handles the container restart policies. If for some unexpected reason the Amazon ECS agent fails or is no longer running, the container won't be restarted. + The restart attempt period defined in your policy determines the period of time (in seconds) that the container must run for before Amazon ECS restarts a container. # Specifying a container restart policy in an Amazon ECS task definition To specify a restart policy for a container in a task definition, within the container definition, specify the `restartPolicy` object. For more information about the `restartPolicy` object, see [Restart policy](task_definition_parameters.md#container_definition_restart_policy). The following is a task definition using the Linux containers on Fargate that sets up a web server. The container definition includes the `restartPolicy` object, with `enabled` set to true to enable a restart policy for the container. The container must run for 180 seconds before it can be restarted and will not be restarted if it exits with the exit code `0`, which indicates success. ``` { "containerDefinitions": [ { "command": [ "/bin/sh -c \"echo ' Amazon ECS Sample App

Amazon ECS Sample App

Congratulations!

Your application is now running on a container in Amazon ECS.

' > /usr/local/apache2/htdocs/index.html && httpd-foreground\"" ], "entryPoint": ["sh", "-c"], "essential": true, "image": "public.ecr.aws/docker/library/httpd:2.4", "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/fargate-task-definition", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } }, "name": "sample-fargate-app", "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ], "restartPolicy": { "enabled": true, "ignoredExitCodes": [0], "restartAttemptPeriod": 180 } } ], "cpu": "256", "executionRoleArn": "arn:aws:iam::012345678910:role/ecsTaskExecutionRole", "family": "fargate-task-definition", "memory": "512", "networkMode": "awsvpc", "runtimePlatform": { "operatingSystemFamily": "LINUX" }, "requiresCompatibilities": ["FARGATE"] } ``` After you have registered a task definition with the `restartPolicy` object in a container definition, you can run a task or create a service with that task definition. For more information, see [Running an application as an Amazon ECS task](standalone-task-create.md) and [Creating an Amazon ECS rolling update deployment](create-service-console-v2.md). # Pass sensitive data to an Amazon ECS container You can safely pass sensitive data, such as credentials to a database, into your container. Secrets, such as API keys and database credentials, are frequently used by applications to gain access other systems. They often consist of a username and password, a certificate, or API key. Access to these secrets should be restricted to specific IAM principals that are using IAM and injected into containers at runtime. Secrets can be seamlessly injected into containers from AWS Secrets Manager and Amazon EC2 Systems Manager Parameter Store. These secrets can be referenced in your task as any of the following. 1. They're referenced as environment variables that use the `secrets` container definition parameter. 1. They're referenced as `secretOptions` if your logging platform requires authentication. For more information, see [logging configuration options](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_LogConfiguration.html#API_LogConfiguration_Contents). 1. They're referenced as secrets pulled by images that use the `repositoryCredentials` container definition parameter if the registry where the container is being pulled from requires authentication. Use this method when pulling images from Amazon ECR Public Gallery. For more information, see [Private registry authentication for tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/private-auth.html). We recommend that you do the following when setting up secrets management. ## Use AWS Secrets Manager or AWS Systems Manager Parameter Store for storing secret materials You should securely store API keys, database credentials, and other secret materials in Secrets Manager or as an encrypted parameter in Systems Manager Parameter Store. These services are similar because they're both managed key-value stores that use AWS KMS to encrypt sensitive data. Secrets Manager, however, also includes the ability to automatically rotate secrets, generate random secrets, and share secrets across accounts. To utilize these features, use Secrets Manager. Otherwise, use encrypted parameters in Systems Manager Parameter Store. **Important** If your secret changes, you must force a new deployment or launch a new task to retrieve the latest secret value. For more information, see the following topics: Tasks - Stop the task, and then start it. For more information, see [Stopping an Amazon ECS task](standalone-task-stop.md) and [Running an application as an Amazon ECS task](standalone-task-create.md). Service - Update the service and use the force new deployment option. For more information, see [Updating an Amazon ECS service](update-service-console-v2.md). ## Retrieve data from an encrypted Amazon S3 bucket You should store secrets in an encrypted Amazon S3 bucket and use task roles to restrict access to those secrets. This prevents the values of environment variables from inadvertently leaking in logs and getting revealed when running `docker inspect`. When you do this, your application must be written to read the secret from the Amazon S3 bucket. For instructions, see [Setting default server-side encryption behavior for Amazon S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-encryption.html). ## Mount the secret to a volume using a sidecar container Because there's an elevated risk of data leakage with environment variables, you should run a sidecar container that reads your secrets from AWS Secrets Manager and write them to a shared volume. This container can run and exit before the application container by using [Amazon ECS container ordering](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDependency.html). When you do this, the application container subsequently mounts the volume where the secret was written. Like the Amazon S3 bucket method, your application must be written to read the secret from the shared volume. Because the volume is scoped to the task, the volume is automatically deleted after the task stops. For an example, see the [task-def.json](https://github.com/aws-samples/aws-secret-sidecar-injector/blob/master/ecs-task-def/task-def.json) project. On Amazon EC2, the volume that the secret is written to can be encrypted with a AWS KMS customer managed key. On AWS Fargate, volume storage is automatically encrypted using a service managed key. # Pass an individual environment variable to an Amazon ECS container **Important** We recommend storing your sensitive data in either AWS Secrets Manager secrets or AWS Systems Manager Parameter Store parameters. For more information, see [Pass sensitive data to an Amazon ECS container](specifying-sensitive-data.md). Environment variables specified in the task definition are readable by all users and roles that are allowed the `DescribeTaskDefinition` action for the task definition. You can pass environment variables to your containers in the following ways: + Individually using the `environment` container definition parameter. This maps to the `--env` option to [https://docs.docker.com/reference/cli/docker/container/run/](https://docs.docker.com/reference/cli/docker/container/run/). + In bulk, using the `environmentFiles` container definition parameter to list one or more files that contain the environment variables. The file must be hosted in Amazon S3. This maps to the `--env-file` option to [https://docs.docker.com/reference/cli/docker/container/run/](https://docs.docker.com/reference/cli/docker/container/run/). The following is a snippet of a task definition showing how to specify individual environment variables. ``` { "family": "", "containerDefinitions": [ { "name": "", "image": "", ... "environment": [ { "name": "variable", "value": "value" } ], ... } ], ... } ``` # Pass environment variables to an Amazon ECS container **Important** We recommend storing your sensitive data in either AWS Secrets Manager secrets or AWS Systems Manager Parameter Store parameters. For more information, see [Pass sensitive data to an Amazon ECS container](specifying-sensitive-data.md). Environment variable files are objects in Amazon S3 and all Amazon S3 security considerations apply. You can't use the `environmentFiles` parameter on Windows containers and Windows containers on Fargate. You can create an environment variable file and store it in Amazon S3 to pass environment variables to your container. By specifying environment variables in a file, you can bulk inject environment variables. Within your container definition, specify the `environmentFiles` object with a list of Amazon S3 buckets containing your environment variable files. Amazon ECS doesn't enforce a size limit on the environment variables, but a large environment variables file might fill up the disk space. Each task that uses an environment variables file causes a copy of the file to be downloaded to disk. Amazon ECS removes the file as part of the task cleanup. For information about the supported environment variables, see [Advanced container definition parameters- Environment](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_environment). Consider the following when specifying an environment variable file in a container definition. + For Amazon ECS tasks on Amazon EC2, your container instances require that the container agent is version `1.39.0` or later to use this feature. For information about how to check your agent version and update to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). + For Amazon ECS tasks on AWS Fargate, your tasks must use platform version `1.4.0` or later (Linux) to use this feature. For more information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). Verify that the variable is supported for the operating system platform. For more information, see [Container definitions](task_definition_parameters.md#container_definitions) and [Other task definition parameters](task_definition_parameters.md#other_task_definition_params). + The file must use the `.env` file extension and UTF-8 encoding. + The task execution role is required to use this feature with the additional permissions for Amazon S3. This allows the container agent to pull the environment variable file from Amazon S3. For more information, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). + There is a limit of 10 files per task definition. + Each line in an environment file must contain an environment variable in `VARIABLE=VALUE` format. Spaces or quotation marks **are** included as part of the values for Amazon ECS files. Lines beginning with `#` are treated as comments and are ignored. For more information about the environment variable file syntax, see [Set environment variables (-e, --env, --env-file)](https://docs.docker.com/reference/cli/docker/container/run/#env) in the Docker documentation. The following is the appropriate syntax. ``` #This is a comment and will be ignored VARIABLE=VALUE ENVIRONMENT=PRODUCTION ``` + If there are environment variables specified using the `environment` parameter in a container definition, they take precedence over the variables contained within an environment file. + If multiple environment files are specified and they contain the same variable, they're processed in order of entry. This means that the first value of the variable is used and subsequent values of duplicate variables are ignored. We recommend that you use unique variable names. + If an environment file is specified as a container override, it's used. Moreover, any other environment files that are specified in the container definition is ignored. + The following rules apply to the Fargate: + The file is handled similar to a native Docker env-file. + Container definitions that reference environment variables that are blank and stored in Amazon S3 do not appear in the container. + There is no support for shell escape handling. + The container entry point interperts the `VARIABLE` values. ## Example The following is a snippet of a task definition showing how to specify an environment variable file. ``` { "family": "", "containerDefinitions": [ { "name": "", "image": "", ... "environmentFiles": [ { "value": "arn:aws:s3:::amzn-s3-demo-bucket/envfile_object_name.env", "type": "s3" } ], ... } ], ... } ``` # Pass Secrets Manager secrets programmatically in Amazon ECS Instead of hardcoding sensitive information in plain text in your application, you can use Secrets Manager to store the sensitive data. We recommend this method of retrieving sensitive data because if the Secrets Manager secret is subsequently updated, the application automatically retrieves the latest version of the secret. Create a secret in Secrets Manager. After you create a Secrets Manager secret, update your application code to retrieve the secret. Review the following considerations before securing sensitive data in Secrets Manager. + Only secrets that store text data, which are secrets created with the `SecretString` parameter of the [CreateSecret](https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_CreateSecret.html) API, are supported. Secrets that store binary data, which are secrets created with the `SecretBinary` parameter of the [CreateSecret](https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_CreateSecret.html) API are not supported. + Use interface VPC endpoints to enhance security controls. You must create the interface VPC endpoints for Secrets Manager. For information about the VPC endpoint, see [Create VPC endpoints](https://docs.aws.amazon.com/secretsmanager/latest/userguide/setup-create-vpc.html) in the *AWS Secrets Manager User Guide*. + The VPC your task uses must use DNS resolution. + Your task definition must use a task role with the additional permissions for Secrets Manager. For more information, see [Amazon ECS task IAM role](task-iam-roles.md). ## Create the Secrets Manager secret You can use the Secrets Manager console to create a secret for your sensitive data. For information about how to create secrets, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) in the *AWS Secrets Manager User Guide*. ## Update your application to programmatically retrieve Secrets Manager secrets You can retrieve secrets with a call to the Secrets Manager APIs directly from your application. For information, see [Retrieve secrets from AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/retrieving-secrets.html) in the *AWS Secrets Manager User Guide*. To retrieve the sensitive data stored in the AWS Secrets Manager, see [Code examples for AWS Secrets Manager using AWS SDKs](https://docs.aws.amazon.com/code-library/latest/ug/secrets-manager_code_examples.html) in the *AWS SDK Code Examples Code Library*. # Pass Systems Manager Parameter Store secrets programmatically in Amazon ECS Systems Manager Parameter Store provides secure storage and management of secrets. You can store data such as passwords, database strings, EC2 instance IDs and AMI IDs, and license codes as parameter values, instead of hardcoding this information in your application. You can store values as plain text or encrypted data. We recommend this method of retrieving sensitive data because if the Systems Manager Parameter Store parameter is subsequently updated, the application automatically retrieves the latest version. Review the following considerations before securing sensitive data in Systems Manager Parameter Store. + Only secrets that store text data are supported. Secrets that store binary data are not supported. + Use interface VPC endpoints to enhance security controls. + The VPC your task uses must use DNS resolution. + For tasks that use EC2, you must use the Amazon ECS agent configuration variable `ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true` to use this feature. You can add it to the `/etc/ecs/ecs.config` file during container instance creation or you can add it to an existing instance and then restart the ECS agent. For more information, see [Amazon ECS container agent configuration](ecs-agent-config.md). + Your task definition must use a task role with the additional permissions for Systems Manager Parameter Store. For more information, see [Amazon ECS task IAM role](task-iam-roles.md). ## Create the parameter You can use the Systems Manager console to create a Systems Manager Parameter Store parameter for your sensitive data. For more information, see [Create a Systems Manager parameter (console)](https://docs.aws.amazon.com/systems-manager/latest/userguide/parameter-create-console.html) or [Create a Systems Manager parameter (AWS CLI)](https://docs.aws.amazon.com/systems-manager/latest/userguide/param-create-cli.html) in the *AWS Systems Manager User Guide*. ## Update your application to programmatically retrieve Systems Manager Parameter Store secrets To retrieve the sensitive data stored in the Systems Manager Parameter Store parameter, see [Code examples for Systems Manager using AWS SDKs](https://docs.aws.amazon.com/code-library/latest/ug/ssm_code_examples.html) in the *AWS SDK Code Examples Code Library*. # Pass Secrets Manager secrets through Amazon ECS environment variables When you inject a secret as an environment variable, you can specify the full contents of a secret, a specific JSON key within a secret. This helps you control the sensitive data exposed to your container. For more information about secret versioning, see [What's in a Secrets Manager secret?](https://docs.aws.amazon.com/secretsmanager/latest/userguide/whats-in-a-secret.html#term_version) in the *AWS Secrets Manager User Guide*. The following should be considered when using an environment variable to inject a Secrets Manager secret into a container. + Sensitive data is injected into your container when the container is initially started. If the secret is subsequently updated or rotated, the container will not receive the updated value automatically. You must either launch a new task or if your task is part of a service you can update the service and use the **Force new deployment** option to force the service to launch a fresh task. + Applications that run on the container and container logs and debugging tools have access to the environment variables. + For Amazon ECS tasks on AWS Fargate, consider the following: + To inject the full content of a secret as an environment variable or in a log configuration, you must use platform version `1.3.0` or later. For information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). + To inject a specific JSON key or version of a secret as an environment variable or in a log configuration, you must use platform version `1.4.0` or later (Linux) or `1.0.0` (Windows). For information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). + For Amazon ECS tasks on EC2, the following should be considered: + To inject a secret using a specific JSON key or version of a secret, your container instance must have version `1.37.0` or later of the container agent. However, we recommend using the latest container agent version. For information about checking your agent version and updating to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). To inject the full contents of a secret as an environment variable or to inject a secret in a log configuration, your container instance must have version `1.22.0` or later of the container agent. + Use interface VPC endpoints to enhance security controls and connect to Secrets Manager through a private subnet. You must create the interface VPC endpoints for Secrets Manager. For information about the VPC endpoint, see [Create VPC endpoints](https://docs.aws.amazon.com/secretsmanager/latest/userguide/setup-create-vpc.html) in the *AWS Secrets Manager User Guide*. For more information about using Secrets Manager and Amazon VPC, see [How to connect to Secrets Manager service within a Amazon VPC](https://aws.amazon.com/blogs//security/how-to-connect-to-aws-secrets-manager-service-within-a-virtual-private-cloud/). + For Windows tasks that are configured to use the `awslogs` logging driver, you must also set the `ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE` environment variable on your container instance. Use the following syntax: ``` [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE", $TRUE, "Machine") Initialize-ECSAgent -Cluster -EnableTaskIAMRole -LoggingDrivers '["json-file","awslogs"]' ``` + Your task definition must use a task execution role with the additional permissions for Secrets Manager. For more information, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). ## Create the AWS Secrets Manager secret You can use the Secrets Manager console to create a secret for your sensitive data. For more information, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) in the *AWS Secrets Manager User Guide*. ## Add the environment variable to the container definition Within your container definition, you can specify the following: + The `secrets` object containing the name of the environment variable to set in the container + The Amazon Resource Name (ARN) of the Secrets Manager secret + Additional parameters that contain the sensitive data to present to the container The following example shows the full syntax that must be specified for the Secrets Manager secret. ``` arn:aws:secretsmanager:region:aws_account_id:secret:secret-name:json-key:version-stage:version-id ``` The following section describes the additional parameters. These parameters are optional, but if you do not use them, you must include the colons `:` to use the default values. Examples are provided below for more context. `json-key` Specifies the name of the key in a key-value pair with the value that you want to set as the environment variable value. Only values in JSON format are supported. If you do not specify a JSON key, then the full contents of the secret is used. `version-stage` Specifies the staging label of the version of a secret that you want to use. If a version staging label is specified, you cannot specify a version ID. If no version stage is specified, the default behavior is to retrieve the secret with the `AWSCURRENT` staging label. Staging labels are used to keep track of different versions of a secret when they are either updated or rotated. Each version of a secret has one or more staging labels and an ID. `version-id` Specifies the unique identifier of the version of a secret that you want to use. If a version ID is specified, you cannot specify a version staging label. If no version ID is specified, the default behavior is to retrieve the secret with the `AWSCURRENT` staging label. Version IDs are used to keep track of different versions of a secret when they are either updated or rotated. Each version of a secret has an ID. For more information, see [Key Terms and Concepts for AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/terms-concepts.html#term_secret) in the *AWS Secrets Manager User Guide*. ### Example container definitions The following examples show ways in which you can reference Secrets Manager secrets in your container definitions. **Example referencing a full secret** The following is a snippet of a task definition showing the format when referencing the full text of a Secrets Manager secret. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf" }] }] } ``` To access the value of this secret from within the container you would need to call the `$environment_variable_name`. **Example referencing full secrets** The following is a snippet of a task definition showing the format when referencing the full text of multiple Secrets Manager secrets. ``` { "containerDefinitions": [{ "secrets": [ { "name": "environment_variable_name1", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf" }, { "name": "environment_variable_name2", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-abcdef" }, { "name": "environment_variable_name3", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-ABCDEF" } ] }] } ``` To access the value of this secret from within the container you would need to call the `$environment_variable_name1`, `$environment_variable_name2`, and `$environment_variable_name3`. **Example referencing a specific key within a secret** The following shows an example output from a [get-secret-value](https://docs.aws.amazon.com/cli/latest/reference/secretsmanager/get-secret-value.html) command that displays the contents of a secret along with the version staging label and version ID associated with it. ``` { "ARN": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf", "Name": "appauthexample", "VersionId": "871d9eca-18aa-46a9-8785-981ddEXAMPLE", "SecretString": "{\"username1\":\"password1\",\"username2\":\"password2\",\"username3\":\"password3\"}", "VersionStages": [ "AWSCURRENT" ], "CreatedDate": 1581968848.921 } ``` Reference a specific key from the previous output in a container definition by specifying the key name at the end of the ARN. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf:username1::" }] }] } ``` **Example referencing a specific secret version** The following shows an example output from a [describe-secret](https://docs.aws.amazon.com/cli/latest/reference/secretsmanager/describe-secret.html) command that displays the unencrypted contents of a secret along with the metadata for all versions of the secret. ``` { "ARN": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf", "Name": "appauthexample", "Description": "Example of a secret containing application authorization data.", "RotationEnabled": false, "LastChangedDate": 1581968848.926, "LastAccessedDate": 1581897600.0, "Tags": [], "VersionIdsToStages": { "871d9eca-18aa-46a9-8785-981ddEXAMPLE": [ "AWSCURRENT" ], "9d4cb84b-ad69-40c0-a0ab-cead3EXAMPLE": [ "AWSPREVIOUS" ] } } ``` Reference a specific version staging label from the previous output in a container definition by specifying the key name at the end of the ARN. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf::AWSPREVIOUS:" }] }] } ``` Reference a specific version ID from the previous output in a container definition by specifying the key name at the end of the ARN. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf:::9d4cb84b-ad69-40c0-a0ab-cead3EXAMPLE" }] }] } ``` **Example referencing a specific key and version staging label of a secret** The following shows how to reference both a specific key within a secret and a specific version staging label. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf:username1:AWSPREVIOUS:" }] }] } ``` To specify a specific key and version ID, use the following syntax. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf:username1::9d4cb84b-ad69-40c0-a0ab-cead3EXAMPLE" }] }] } ``` For information about how to create a task definition with the secret specified in an environment variable, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). # Pass Systems Manager parameters through Amazon ECS environment variables Amazon ECS allows you to inject sensitive data into your containers by storing your sensitive data in AWS Systems Manager Parameter Store parameters and then referencing them in your container definition. Consider the following when using an environment variable to inject a Systems Manager secret into a container. + Sensitive data is injected into your container when the container is initially started. If the secret is subsequently updated or rotated, the container will not receive the updated value automatically. You must either launch a new task or if your task is part of a service you can update the service and use the **Force new deployment** option to force the service to launch a fresh task. + For Amazon ECS tasks on AWS Fargate, the following should be considered: + To inject the full content of a secret as an environment variable or in a log configuration, you must use platform version `1.3.0` or later. For information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). + To inject a specific JSON key or version of a secret as an environment variable or in a log configuration, you must use platform version `1.4.0` or later (Linux) or `1.0.0` (Windows). For information, see [Fargate platform versions for Amazon ECS](platform-fargate.md). + For Amazon ECS tasks on EC2, the following should be considered: + To inject a secret using a specific JSON key or version of a secret, your container instance must have version `1.37.0` or later of the container agent. However, we recommend using the latest container agent version. For information about checking your agent version and updating to the latest version, see [Updating the Amazon ECS container agent](ecs-agent-update.md). To inject the full contents of a secret as an environment variable or to inject a secret in a log configuration, your container instance must have version `1.22.0` or later of the container agent. + Use interface VPC endpoints to enhance security controls. You must create the interface VPC endpoints for Systems Manager. For information about the VPC endpoint, see [Improve the security of EC2 instances by using VPC endpoints for Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-create-vpc.html) in the *AWS Systems Manager User Guide*. + Your task definition must use a task execution role with the additional permissions for Systems Manager Parameter Store. For more information, see [Amazon ECS task execution IAM role](task_execution_IAM_role.md). + For Windows tasks that are configured to use the `awslogs` logging driver, you must also set the `ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE` environment variable on your container instance. Use the following syntax: ``` [Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE", $TRUE, "Machine") Initialize-ECSAgent -Cluster -EnableTaskIAMRole -LoggingDrivers '["json-file","awslogs"]' ``` ## Create the Systems Manager parameter You can use the Systems Manager console to create a Systems Manager Parameter Store parameter for your sensitive data. For more information, see [Create a Systems Manager parameter (console)](https://docs.aws.amazon.com/systems-manager/latest/userguide/parameter-create-console.html) or [Create a Systems Manager parameter (AWS CLI)](https://docs.aws.amazon.com/systems-manager/latest/userguide/param-create-cli.html) in the *AWS Systems Manager User Guide*. ## Add the environment variable to the container definition Within your container definition in the task definition, specify `secrets` with the name of the environment variable to set in the container and the full ARN of the Systems Manager Parameter Store parameter containing the sensitive data to present to the container. For more information, see [secrets](task_definition_parameters.md#ContainerDefinition-secrets). The following is a snippet of a task definition showing the format when referencing a Systems Manager Parameter Store parameter. If the Systems Manager Parameter Store parameter exists in the same Region as the task you are launching, then you can use either the full ARN or name of the parameter. If the parameter exists in a different Region, then specify the full ARN. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name" }] }] } ``` For information about how to create a task definition with the secret specified in an environment variable, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). ## Update your application to programmatically retrieve Systems Manager Parameter Store secrets To retrieve the sensitive data stored in the Systems Manager Parameter Store parameter, see [Code examples for Systems Manager using AWS SDKs](https://docs.aws.amazon.com/code-library/latest/ug/ssm_code_examples.html) in the *AWS SDK Code Examples Code Library*. # Pass secrets for Amazon ECS logging configuration You can use the `secretOptions` parameter in `logConfiguration` to pass sensitive data used for logging. You can store the secret in Secrets Manager or Systems Manager. ## Use Secrets Manager Within your container definition, when specifying a `logConfiguration` you can specify `secretOptions` with the name of the log driver option to set in the container and the full ARN of the Secrets Manager secret containing the sensitive data to present to the container. For more information about creating secrets, see [Create an AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html). The following is a snippet of a task definition showing the format when referencing an Secrets Manager secret. ``` { "containerDefinitions": [{ "logConfiguration": [{ "logDriver": "splunk", "options": { "splunk-url": "https://your_splunk_instance:8088" }, "secretOptions": [{ "name": "splunk-token", "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf" }] }] }] } ``` ## Add the environment variable to the container definition Within your container definition, specify `secrets` with the name of the environment variable to set in the container and the full ARN of the Systems Manager Parameter Store parameter containing the sensitive data to present to the container. For more information, see [secrets](task_definition_parameters.md#ContainerDefinition-secrets). The following is a snippet of a task definition showing the format when referencing a Systems Manager Parameter Store parameter. If the Systems Manager Parameter Store parameter exists in the same Region as the task you are launching, then you can use either the full ARN or name of the parameter. If the parameter exists in a different Region, then specify the full ARN. ``` { "containerDefinitions": [{ "secrets": [{ "name": "environment_variable_name", "valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name" }] }] } ``` For information about how to create a task definition with the secret specified in an environment variable, see [Creating an Amazon ECS task definition using the console](create-task-definition.md). ## Use Systems Manager You can inject sensitive data in a log configuration. Within your container definition, when specifying a `logConfiguration` you can specify `secretOptions` with the name of the log driver option to set in the container and the full ARN of the Systems Manager Parameter Store parameter containing the sensitive data to present to the container. **Important** If the Systems Manager Parameter Store parameter exists in the same Region as the task you are launching, then you can use either the full ARN or name of the parameter. If the parameter exists in a different Region, then specify the full ARN. The following is a snippet of a task definition showing the format when referencing a Systems Manager Parameter Store parameter. ``` { "containerDefinitions": [{ "logConfiguration": [{ "logDriver": "fluentd", "options": { "tag": "fluentd demo" }, "secretOptions": [{ "name": "fluentd-address", "valueFrom": "arn:aws:ssm:region:aws_account_id:parameter:/parameter_name" }] }] }] } ``` # Specifying sensitive data using Secrets Manager secrets in Amazon ECS Amazon ECS allows you to inject sensitive data into your containers by storing your sensitive data in AWS Secrets Manager secrets and then referencing them in your container definition. For more information, see [Pass sensitive data to an Amazon ECS container](specifying-sensitive-data.md). Learn how to create an Secrets Manager secret, reference the secret in an Amazon ECS task definition, and then verify it worked by querying the environment variable inside a container showing the contents of the secret. ## Prerequisites This tutorial assumes that the following prerequisites have been completed: + The steps in [Set up to use Amazon ECS](get-set-up-for-amazon-ecs.md) have been completed. + Your user has the required IAM permissions to create the Secrets Manager and Amazon ECS resources. ## Step 1: Create an Secrets Manager secret You can use the Secrets Manager console to create a secret for your sensitive data. In this tutorial we will be creating a basic secret for storing a username and password to reference later in a container. For more information, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) in the *AWS Secrets Manager User Guide*. The ** key/value pairs to be stored in this secret** is the environment variable value in your container at the end of the tutorial. Save the **Secret ARN** to reference in your task execution IAM policy and task definition in later steps. ## Step 2: Add the secrets permissions to the task execution role In order for Amazon ECS to retrieve the sensitive data from your Secrets Manager secret, you must have the secrets permissions for the task execution role. For more information, see [Secrets Manager or Systems Manager permissions](task_execution_IAM_role.md#task-execution-secrets). ## Step 3: Create a task definition You can use the Amazon ECS console to create a task definition that references a Secrets Manager secret. **To create a task definition that specifies a secret** Use the IAM console to update your task execution role with the required permissions. 1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2). 1. In the navigation pane, choose **Task definitions**. 1. Choose **Create new task definition**, **Create new task definition with JSON**. 1. In the JSON editor box, enter the following task definition JSON text, ensuring that you specify the full ARN of the Secrets Manager secret you created in step 1 and the task execution role you updated in step 2. Choose **Save**. 1. ``` { "executionRoleArn": "arn:aws:iam::aws_account_id:role/ecsTaskExecutionRole", "containerDefinitions": [ { "entryPoint": [ "sh", "-c" ], "portMappings": [ { "hostPort": 80, "protocol": "tcp", "containerPort": 80 } ], "command": [ "/bin/sh -c \"echo ' Amazon ECS Sample App

Amazon ECS Sample App

Congratulations!

Your application is now running on a container in Amazon ECS.

' > /usr/local/apache2/htdocs/index.html && httpd-foreground\"" ], "cpu": 10, "secrets": [ { "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:username_value", "name": "username_value" } ], "memory": 300, "image": "public.ecr.aws/docker/library/httpd:2.4", "essential": true, "name": "ecs-secrets-container" } ], "family": "ecs-secrets-tutorial" } ``` 1. Choose **Create**. ## Step 4: Create a cluster You can use the Amazon ECS console to create a cluster containing a container instance to run the task on. If you have an existing cluster with at least one container instance registered to it with the available resources to run one instance of the task definition created for this tutorial you can skip to the next step. For this tutorial we will be creating a cluster with one `t2.micro` container instance using the Amazon ECS-optimized Amazon Linux 2 AMI. For information about how to create a cluster for EC2, see [Creating an Amazon ECS cluster for Amazon EC2 workloads](create-ec2-cluster-console-v2.md). ## Step 5: Run a task You can use the Amazon ECS console to run a task using the task definition you created. For this tutorial we will be running a task using EC2, using the cluster we created in the previous step. For information about how to run a task, see [Running an application as an Amazon ECS task](standalone-task-create.md). ## Step 6: Verify You can verify all of the steps were completed successfully and the environment variable was created properly in your container using the following steps. **To verify that the environment variable was created** 1. Find the public IP or DNS address for your container instance. 1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2). 1. In the navigation pane, choose **Clusters**, and then choose the cluster you created. 1. Choose **Infrastructure**, and then choose the container instance. 1. Record the **Public IP** or **Public DNS** for your instance. 1. If you are using a macOS or Linux computer, connect to your instance with the following command, substituting the path to your private key and the public address for your instance: ``` $ ssh -i /path/to/my-key-pair.pem ec2-user@ec2-198-51-100-1.compute-1.amazonaws.com ``` For more information about using a Windows computer, see [Connect to your Linux instance using PuTTY](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-linux-inst-from-windows.html) in the *Amazon EC2 User Guide*. **Important** For more information about any issues while connecting to your instance, see [Troubleshooting Connecting to Your Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html) in the *Amazon EC2 User Guide*. 1. List the containers running on the instance. Note the container ID for `ecs-secrets-tutorial` container. ``` docker ps ``` 1. Connect to the `ecs-secrets-tutorial` container using the container ID from the output of the previous step. ``` docker exec -it container_ID /bin/bash ``` 1. Use the `echo` command to print the value of the environment variable. ``` echo $username_value ``` If the tutorial was successful, you should see the following output: ``` password_value ``` **Note** Alternatively, you can list all environment variables in your container using the `env` (or `printenv`) command. ## Step 7: Clean up When you are finished with this tutorial, you should clean up the associated resources to avoid incurring charges for unused resources. **To clean up the resources** 1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2). 1. In the navigation pane, choose **Clusters**. 1. On the **Clusters** page, choose the cluster. 1. Choose **Delete Cluster**. 1. In the confirmation box, enter **delete *cluster name***, and then choose **Delete**. 1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). 1. In the navigation pane, choose **Roles**. 1. Search the list of roles for `ecsTaskExecutionRole` and select it. 1. Choose **Permissions**, then choose the **X** next to **ECSSecretsTutorial**. Choose **Remove**. 1. Open the Secrets Manager console at [https://console.aws.amazon.com/secretsmanager/](https://console.aws.amazon.com/secretsmanager/). 1. Select the **username\$1value** secret you created and choose **Actions**, **Delete secret**.