Optimize AWS App2Container generated Docker images - AWS Prescriptive Guidance

Optimize AWS App2Container generated Docker images

Created by Varun Sharma (AWS)

Environment: PoC or pilot

Technologies: Containers & microservices; Modernization; DevOps

AWS services: Amazon ECS

Summary

AWS App2Container is a command line tool that helps transform existing applications running on premises or on virtual machines into containers, without needing code changes.

Based on application type, App2Container takes a conservative approach to identify dependencies. For process mode, all non-system files on the application server are included in the container image. In such cases, a fairly large image might be generated.

This pattern provides an approach for optimizing the container images generated by App2Container. It is applicable for all Java applications discovered by App2Container in process mode. The workflow defined in the pattern is designed to be run on the application server.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • A Java application running on an application server on a Linux server

  • App2Container installed and set up, with all prerequisites met, on the Linux server

Architecture

Source technology stack  

  • A Java application running on a Linux server

Target technology stack  

  • A Docker image generated by App2Container

Target architecture flow 

Diagram showing AWS App2Container process for containerizing Java app on Linux system.
  1. Discover the applications that are running on the application server, and analyze the applications.

  2. Containerize the applications.

  3. Evaluate the size of the Docker image. If the image is too large, continue to step 4.

  4. Use the shell script (attached) to identify large files.

  5. Update the appExcludedFiles and appSpecificFiles lists in the analysis.json file.

Tools

Tools

  • AWS App2Container – AWS App2Container (A2C) is a command line tool to help you lift and shift applications that run in your on-premises data center or on virtual machines, so that they run in containers that are managed by Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS).

Code 

The optimizeImage.sh shell script and an example analysis.json file are attached.

The optimizeImage.sh file is a utility script for reviewing the contents of the App2Container generated file, ContainerFiles.tar. The review identifies files or subdirectories that are large and can be excluded. The script is a wrapper for the following tar command.

tar -Ptvf <path>|tr -s ' '|cut -d ' ' -f3,6| awk '$2 ~/<filetype>$/'| awk '$2 ~/^<toplevel>/'| cut -f1-<depth> -d'/'|awk '{ if ($1>= <size>) arr[$2]+=$1 } END { for (key in arr) { if(<verbose>) printf("%-50s\t%-50s\n", key, arr[key]) else printf("%s,\n", key) } } '|sort -k2 -nr

In the tar command, the script uses the following values:

path

The path to ContainerFiles.tar

filetype

The file type to match

toplevel

The top-level directory to match

depth

The depth of the absolute path

size

The size for each file

The script does the following:

  1. It uses tar -Ptvf to list the files without extracting them.

  2. It filters the files by file type, starting with the top-level directory.

  3. Based on the depth, it generates the absolute path as an index.

  4. Based on the index and stores, it provides the total size of the subdirectory.

  5. It prints the size of the subdirectory.

You can also replace the values manually in the tar command.

Epics

TaskDescriptionSkills required
Discover the on-premises Java applications.

To discover all applications running on the application server, run the following command.

sudo app2container inventory 
AWS DevOps
Analyze the discovered applications.

To analyze each application by using the application-id that was obtained in the inventory stage, run the following command.

sudo app2container analyze --application-id <java-app-id>
AWS DevOps
Containerize the analyzed applications.

To containerize an application, run the following command.

sudo app2container containerize --application-id <application-id>

The command generates the Docker image along with a tar bundle in the workspace location.

If the Docker image is too large, proceed to the next step.

AWS DevOps
TaskDescriptionSkills required
Identify the Artifacts tar file size.

Identify the ContainerFiles.tar file in {workspace}/{java-app-id}/Artifacts, where workspace is the App2Container workspace and java-app-id is the application ID. 

./optimizeImage.sh -p /{workspace}/{java-app-id}/Artifacts/ContainerFiles.tar -d 0 -t / -v

This is the total size of the tar file after optimization.

AWS DevOps
List the subdirectories under the / directory and their sizes.

To identify the sizes of the major subdirectories under the / top-level directory, run the following command.

./optimizeImage.sh -p /{workspace}/{java-app-id}/Artifacts/ContainerFiles.tar -d 1 -t / -s 1000000 -v /var                         554144711 /usr 2097300819 /tmp 18579660 /root 43645397 /opt 222320534 /home 65212518 /etc 11357677                                    
AWS DevOps
Identify large subdirectories under the / directory.

For each major subdirectory that is listed in the previous command, identify the sizes of its subdirectories. Use -d to increase the depth and -t to indicate the top-level directory.

For example, use /var as the top-level directory. Under /var, identify all the large subdirectories and their sizes.

./optimizeImage.sh -p /{workspace}/{java-app-id}/Artifacts/ContainerFiles.tar -d 2 -t /var -s 1000000 -v

Repeat this process for each subdirectory listed in the previous step (for example, /usr, /tmp, /opt, and /home).

AWS DevOps
Analyze the large folder in each subdirectory under the / directory.

For each subdirectory that is listed in the previous step, identify any folders that are required to run the application.

For example, using the subdirectories from the previous step, list all the subdirectories in the /var directory and their sizes. Identify any subdirectories that are needed by the application.

/var/tmp                        237285851 /var/lib 24489984 /var/cache                      237285851

To exclude subdirectories that are not needed by the application, in the analysis.json file, add those subdirectories to the appExcludedFiles section under containerParameters.

An example analysis.json file is attached.

AWS DevOps
Identify files that are needed from the appExcludes list.

For each subdirectory that is added to appExcludes list, identify any files in that subdirectory that are required by the application. In the analysis.json file, add the specific files or subdirectories in the appSpecificFiles section under containerParameters

For example, if the /usr/lib directory is added to the exclude list, but /usr/lib/jvm is needed by the application, add /usr/lib/jvm to the appSpecificFiles section.

AWS DevOps
TaskDescriptionSkills required
Containerize the analyzed application.

To containerize the application, run the following command.

sudo app2container containerize --application-id <application-id>

The command generates the Docker image along with a tar bundle in the workspace location.

AWS DevOps
Identify the Artifacts tar file size.

Identify the ContainerFiles.tar file in {workspace}/{java-app-id}/Artifacts, where workspace is the App2Container workspace and java-app-id is the application ID. 

./optimizeImage.sh -p /{workspace}/{java-app-id}/Artifacts/ContainerFiles.tar -d 0 -t / -v

This is the total size of the tar file after optimization. 

AWS DevOps
Run the Docker image.

To verify that the image starts without errors, run the Docker image locally using the following commands.

To identify the imageId of the container, use docker images |grep java-app-id.

To run the container, use docker run -d <image id>.

AWS DevOps

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip