GetScalingConfigurationRecommendation
Starts an Amazon SageMaker Inference Recommender autoscaling recommendation job. Returns recommendations for autoscaling policies that you can apply to your SageMaker endpoint.
Request Syntax
{
"EndpointName": "string
",
"InferenceRecommendationsJobName": "string
",
"RecommendationId": "string
",
"ScalingPolicyObjective": {
"MaxInvocationsPerMinute": number
,
"MinInvocationsPerMinute": number
},
"TargetCpuUtilizationPerCore": number
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- EndpointName
-
The name of an endpoint benchmarked during a previously completed inference recommendation job. This name should come from one of the recommendations returned by the job specified in the
InferenceRecommendationsJobName
field.Specify either this field or the
RecommendationId
field.Type: String
Length Constraints: Maximum length of 63.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
Required: No
- InferenceRecommendationsJobName
-
The name of a previously completed Inference Recommender job.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}
Required: Yes
- RecommendationId
-
The recommendation ID of a previously completed inference recommendation. This ID should come from one of the recommendations returned by the job specified in the
InferenceRecommendationsJobName
field.Specify either this field or the
EndpointName
field.Type: String
Required: No
- ScalingPolicyObjective
-
An object where you specify the anticipated traffic pattern for an endpoint.
Type: ScalingPolicyObjective object
Required: No
- TargetCpuUtilizationPerCore
-
The percentage of how much utilization you want an instance to use before autoscaling. The default value is 50%.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 100.
Required: No
Response Syntax
{
"DynamicScalingConfiguration": {
"MaxCapacity": number,
"MinCapacity": number,
"ScaleInCooldown": number,
"ScaleOutCooldown": number,
"ScalingPolicies": [
{ ... }
]
},
"EndpointName": "string",
"InferenceRecommendationsJobName": "string",
"Metric": {
"InvocationsPerInstance": number,
"ModelLatency": number
},
"RecommendationId": "string",
"ScalingPolicyObjective": {
"MaxInvocationsPerMinute": number,
"MinInvocationsPerMinute": number
},
"TargetCpuUtilizationPerCore": number
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- DynamicScalingConfiguration
-
An object with the recommended values for you to specify when creating an autoscaling policy.
Type: DynamicScalingConfiguration object
- EndpointName
-
The name of an endpoint benchmarked during a previously completed Inference Recommender job.
Type: String
Length Constraints: Maximum length of 63.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
- InferenceRecommendationsJobName
-
The name of a previously completed Inference Recommender job.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}
- Metric
-
An object with a list of metrics that were benchmarked during the previously completed Inference Recommender job.
Type: ScalingPolicyMetric object
- RecommendationId
-
The recommendation ID of a previously completed inference recommendation.
Type: String
- ScalingPolicyObjective
-
An object representing the anticipated traffic pattern for an endpoint that you specified in the request.
Type: ScalingPolicyObjective object
- TargetCpuUtilizationPerCore
-
The percentage of how much utilization you want an instance to use before autoscaling, which you specified in the request. The default value is 50%.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 100.
Errors
For information about the errors that are common to all actions, see Common Errors.
- ResourceNotFound
-
Resource being access is not found.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: