Migrate inference workload from x86 to AWS Graviton - Amazon SageMaker AI

Migrate inference workload from x86 to AWS Graviton

AWS Graviton is a series of ARM-based processors designed by AWS. They are more energy efficient than x86-based processors and offer a compelling price-performance ratio. Amazon SageMaker AI offers Graviton-based instances so that you can take advantage of these advanced processors for your inference needs.

You can migrate your existing inference workloads from x86-based instances to Graviton-based instances, by using either ARM compatible container images or multi-architecture container images. This guide assumes that you are either using AWS Deep Learning container images, or your own ARM compatible container images. For more information on building your own images, check Building your image.

At a high level, migrating inference workload from x86-based instances to Graviton-based instances is a four-step process:

  1. Push container images to Amazon Elastic Container Registry (Amazon ECR), an AWS managed container registry.

  2. Create a SageMaker AI Model.

  3. Create an endpoint configuration.

  4. Create an endpoint.

The following sections of this guide provide more details regarding the above steps. Replace the user placeholder text in the code examples with your own information.

Push container images to Amazon ECR

You can push your container images to Amazon ECR with the AWS CLI. When using an ARM compatible image, verify that it supports ARM architecture:

docker inspect deep-learning-container-uri

The response "Architecture": "arm64" indicates that the image supports ARM architecture. You can push it to Amazon ECR with the docker push command. For more information, check Pushing a Docker image.

Multi-architecture container images are fundamentally a set of container images supporting different architectures or operating systems, that you can refer to by a common manifest name. If you are using multi-architecture container images, then in addition to pushing the images to Amazon ECR, you will also have to push a manifest list to Amazon ECR. A manifest list allows for the nested inclusion of other image manifests, where each included image is specified by architecture, operating system and other platform attributes. The following example creates a manifest list, and pushes it to Amazon ECR.

  1. Create a manifest list.

    docker manifest create aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository \ aws-account-id.dkr.ecr.aws-account-id.amazonaws.com/my-repository:amd64 \ aws-account-id.dkr.ecr.aws-account-id.amazonaws.com/my-repository:arm64 \
  2. Annotate the manifest list, so that it correctly identifies which image is for which architecture.

    docker manifest annotate --arch arm64 aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository \ aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository:arm64
  3. Push the manifest.

    docker manifest push aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository

For more information on creating and pushing manifest lists to Amazon ECR, check Introducing multi-architecture container images for Amazon ECR, and Pushing a multi-architecture image.

Create a SageMaker AI Model

Create a SageMaker AI Model by calling the CreateModel API.

import boto3 from sagemaker import get_execution_role aws_region = "aws-region" sagemaker_client = boto3.client("sagemaker", region_name=aws_region) role = get_execution_role() sagemaker_client.create_model( ModelName = "model-name", PrimaryContainer = { "Image": "deep-learning-container-uri", "ModelDataUrl": "model-s3-location", "Environment": { "SAGEMAKER_PROGRAM": "inference.py", "SAGEMAKER_SUBMIT_DIRECTORY": "inference-script-s3-location", "SAGEMAKER_CONTAINER_LOG_LEVEL": "20", "SAGEMAKER_REGION": aws_region, } }, ExecutionRoleArn = role )

Create an endpoint configuration

Create an endpoint configuration by calling the CreateEndpointConfig API. For a list of Graviton-based instances, check Compute optimized instances.

sagemaker_client.create_endpoint_config( EndpointConfigName = "endpoint-config-name", ProductionVariants = [ { "VariantName": "variant-name", "ModelName": "model-name", "InitialInstanceCount": 1, "InstanceType": "ml.c7g.xlarge", # Graviton-based instance } ] )

Create an endpoint

Create an endpoint by calling the CreateEndpoint API.

sagemaker_client.create_endpoint( EndpointName = "endpoint-name", EndpointConfigName = "endpoint-config-name" )