Register a model
Before you add a scaling policy to your model, you first must register your model for auto scaling and define the scaling limits for the model.
The following procedures cover how to register a model (production variant) for auto scaling using the AWS Command Line Interface (AWS CLI) or Application Auto Scaling API.
Register a model (AWS CLI)
To register your production variant, use the register-scalable-target command with the following parameters:
-
--service-namespace
—Set this value tosagemaker
. -
--resource-id
—The resource identifier for the model (specifically, the production variant). For this parameter, the resource type isendpoint
and the unique identifier is the name of the production variant. For example,endpoint/
.my-endpoint
/variant/my-variant
-
--scalable-dimension
—Set this value tosagemaker:variant:DesiredInstanceCount
. -
--min-capacity
—The minimum number of instances. This value must be set to at least 1 and must be equal to or less than the value specified formax-capacity
. -
--max-capacity
—The maximum number of instances. This value must be set to at least 1 and must be equal to or greater than the value specified formin-capacity
.
The following example shows how to register a variant named
, running on the
my-variant
endpoint, that can
be dynamically scaled to have one to eight instances.my-endpoint
aws application-autoscaling register-scalable-target \ --service-namespace sagemaker \ --resource-id endpoint/
my-endpoint
/variant/my-variant
\ --scalable-dimension sagemaker:variant:DesiredInstanceCount \ --min-capacity1
\ --max-capacity8
Register a model (Application Auto Scaling API)
To register your model with Application Auto Scaling, use the RegisterScalableTarget Application Auto Scaling API action with the following parameters:
-
ServiceNamespace
—Set this value tosagemaker
. -
ResourceID
—The resource identifier for the production variant. For this parameter, the resource type isendpoint
and the unique identifier is the name of the variant. For exampleendpoint/
.my-endpoint
/variant/my-variant
-
ScalableDimension
—Set this value tosagemaker:variant:DesiredInstanceCount
. -
MinCapacity
—The minimum number of instances. This value must be set to at least 1 and must be equal to or less than the value specified forMaxCapacity
. -
MaxCapacity
—The maximum number of instances. This value must be set to at least 1 and must be equal to or greater than the value specified forMinCapacity
.
The following example shows how to register a variant named
, running on the
my-variant
endpoint, that can
be dynamically scaled to use one to eight instances.my-endpoint
POST / HTTP/1.1 Host: application-autoscaling.us-east-2.amazonaws.com Accept-Encoding: identity X-Amz-Target: AnyScaleFrontendService.RegisterScalableTarget X-Amz-Date: 20230506T182145Z User-Agent: aws-cli/2.0.0 Python/3.7.5 Windows/10 botocore/2.0.0dev4 Content-Type: application/x-amz-json-1.1 Authorization: AUTHPARAMS { "ServiceNamespace": "sagemaker", "ResourceId": "endpoint/
my-endpoint
/variant/my-variant
", "ScalableDimension": "sagemaker:variant:DesiredInstanceCount", "MinCapacity":1
, "MaxCapacity":8
}