CreateAIRecommendationJob
Creates a recommendation job that generates intelligent optimization recommendations for generative AI inference deployments. The job analyzes your model, workload configuration, and performance targets to recommend optimal instance types, model optimization techniques (such as quantization and speculative decoding), and deployment configurations.
Request Syntax
{
"AIRecommendationJobName": "string",
"AIWorkloadConfigIdentifier": "string",
"ComputeSpec": {
"CapacityReservationConfig": {
"CapacityReservationPreference": "string",
"MlReservationArns": [ "string" ]
},
"InstanceTypes": [ "string" ]
},
"InferenceSpecification": {
"Framework": "string"
},
"ModelSource": { ... },
"OptimizeModel": boolean,
"OutputConfig": {
"ModelPackageGroupIdentifier": "string",
"S3OutputLocation": "string"
},
"PerformanceTarget": {
"Constraints": [
{
"Metric": "string"
}
]
},
"RoleArn": "string",
"Tags": [
{
"Key": "string",
"Value": "string"
}
]
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- AIRecommendationJobName
-
The name of the AI recommendation job. The name must be unique within your AWS account in the current AWS Region.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 63.
Pattern:
[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}Required: Yes
- AIWorkloadConfigIdentifier
-
The name or Amazon Resource Name (ARN) of the AI workload configuration to use for this recommendation job.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 256.
Pattern:
(arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:[a-z\-]*/)?([a-zA-Z0-9]([a-zA-Z0-9\-]){0,62})(?<!-)Required: Yes
- ComputeSpec
-
The compute resource specification for the recommendation job. You can specify up to 3 instance types to consider, and optionally provide capacity reservation configuration.
Type: AIRecommendationComputeSpec object
Required: No
- InferenceSpecification
-
The inference framework configuration. Specify the framework (such as LMI or vLLM) for the recommendation job.
Type: AIRecommendationInferenceSpecification object
Required: No
- ModelSource
-
The source of the model to optimize. Specify the Amazon S3 location of the model artifacts.
Type: AIModelSource object
Note: This object is a Union. Only one member of this object can be specified or returned.
Required: Yes
- OptimizeModel
-
Whether to allow model optimization techniques such as quantization, speculative decoding, and kernel tuning. The default is
true.Type: Boolean
Required: No
- OutputConfig
-
The output configuration for the recommendation job, including the Amazon S3 location for results and an optional model package group where the optimized model is registered.
Type: AIRecommendationOutputConfig object
Required: Yes
- PerformanceTarget
-
The performance targets for the recommendation job. Specify constraints on metrics such as time to first token (
ttft-ms),throughput, orcost.Type: AIRecommendationPerformanceTarget object
Required: Yes
- RoleArn
-
The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker AI to perform tasks on your behalf.
Type: String
Length Constraints: Minimum length of 20. Maximum length of 2048.
Pattern:
arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+Required: Yes
- Tags
-
The metadata that you apply to AWS resources to help you categorize and organize them.
Type: Array of Tag objects
Array Members: Minimum number of 0 items. Maximum number of 50 items.
Required: No
Response Syntax
{
"AIRecommendationJobArn": "string"
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- AIRecommendationJobArn
-
The Amazon Resource Name (ARN) of the created recommendation job.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 256.
Pattern:
arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:ai-recommendation-job/[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
Errors
For information about the errors that are common to all actions, see Common Error Types.
- ResourceInUse
-
Resource being accessed is in use.
HTTP Status Code: 400
- ResourceLimitExceeded
-
You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.
HTTP Status Code: 400
- ResourceNotFound
-
Resource being access is not found.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: