Auto scaling prerequisites - Amazon SageMaker AI

Auto scaling prerequisites

Before you can use auto scaling, you must have already created an Amazon SageMaker AI model endpoint. You can have multiple model versions for the same endpoint. Each model is referred to as a production (model) variant. For more information about deploying a model endpoint, see Deploy the Model to SageMaker AI Hosting Services.

To activate auto scaling for a model, you can use the SageMaker AI console, the AWS Command Line Interface (AWS CLI), or an AWS SDK through the Application Auto Scaling API.

  • If this is your first time configuring scaling for a model, we recommend you Configure model auto scaling with the console.

  • When using the AWS CLI or the Application Auto Scaling API, the flow is to register the model as a scalable target, define the scaling policy, and then apply it. On the SageMaker AI console, under Inference in the navigation pane, choose Endpoints. Find your model's endpoint name and then choose it to find the variant name. You must specify both the endpoint name and the variant name to activate auto scaling for a model.

Auto scaling is made possible by a combination of the Amazon SageMaker AI, Amazon CloudWatch, and Application Auto Scaling APIs. For information about the minimum required permissions, see Application Auto Scaling identity-based policy examples in the Application Auto Scaling User Guide.

The SagemakerFullAccessPolicy IAM policy has all the IAM permissions required to perform auto scaling. For more information about SageMaker AI IAM permissions, see How to use SageMaker AI execution roles.

If you manage your own permission policy, you must include the following permissions:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:DescribeEndpoint", "sagemaker:DescribeEndpointConfig", "sagemaker:UpdateEndpointWeightsAndCapacities" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "application-autoscaling:*" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:CreateServiceLinkedRole", "Resource": "arn:aws:iam::*:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint", "Condition": { "StringLike": { "iam:AWSServiceName": "sagemaker.application-autoscaling.amazonaws.com" } } }, { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricAlarm", "cloudwatch:DescribeAlarms", "cloudwatch:DeleteAlarms" ], "Resource": "*" } ] }

Service-linked role

Auto scaling uses the AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint service-linked role. This service-linked role grants Application Auto Scaling permission to describe the alarms for your policies, to monitor current capacity levels, and to scale the target resource. This role is created for you automatically. For automatic role creation to succeed, you must have permission for the iam:CreateServiceLinkedRole action. For more information, see Service-linked roles in the Application Auto Scaling User Guide.