Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Auto scaling prerequisites

Focus mode
Auto scaling prerequisites - Amazon SageMaker AI

Before you can use auto scaling, you must have already created an Amazon SageMaker AI model endpoint. You can have multiple model versions for the same endpoint. Each model is referred to as a production (model) variant. For more information about deploying a model endpoint, see Deploy the Model to SageMaker AI Hosting Services.

To activate auto scaling for a model, you can use the SageMaker AI console, the AWS Command Line Interface (AWS CLI), or an AWS SDK through the Application Auto Scaling API.

  • If this is your first time configuring scaling for a model, we recommend you Configure model auto scaling with the console.

  • When using the AWS CLI or the Application Auto Scaling API, the flow is to register the model as a scalable target, define the scaling policy, and then apply it. On the SageMaker AI console, under Inference in the navigation pane, choose Endpoints. Find your model's endpoint name and then choose it to find the variant name. You must specify both the endpoint name and the variant name to activate auto scaling for a model.

Auto scaling is made possible by a combination of the Amazon SageMaker AI, Amazon CloudWatch, and Application Auto Scaling APIs. For information about the minimum required permissions, see Application Auto Scaling identity-based policy examples in the Application Auto Scaling User Guide.

The SagemakerFullAccessPolicy IAM policy has all the IAM permissions required to perform auto scaling. For more information about SageMaker AI IAM permissions, see How to use SageMaker AI execution roles.

If you manage your own permission policy, you must include the following permissions:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:DescribeEndpoint", "sagemaker:DescribeEndpointConfig", "sagemaker:UpdateEndpointWeightsAndCapacities" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "application-autoscaling:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:CreateServiceLinkedRole", "Resource": "arn:aws:iam::*:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint", "Condition": { "StringLike": { "iam:AWSServiceName": "sagemaker.application-autoscaling.amazonaws.com" } } }, { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricAlarm", "cloudwatch:DescribeAlarms", "cloudwatch:DeleteAlarms" ], "Resource": "*" } ] }

Service-linked role

Auto scaling uses the AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint service-linked role. This service-linked role grants Application Auto Scaling permission to describe the alarms for your policies, to monitor current capacity levels, and to scale the target resource. This role is created for you automatically. For automatic role creation to succeed, you must have permission for the iam:CreateServiceLinkedRole action. For more information, see Service-linked roles in the Application Auto Scaling User Guide.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.