Request Syntax Request Parameters Response Syntax Response Elements Errors See Also

CreateAIRecommendationJob

Creates a recommendation job that generates intelligent optimization recommendations for generative AI inference deployments. The job analyzes your model, workload configuration, and performance targets to recommend optimal instance types, model optimization techniques (such as quantization and speculative decoding), and deployment configurations.

Request Syntax


{
   "AIRecommendationJobName": "string",
   "AIWorkloadConfigIdentifier": "string",
   "ComputeSpec": { 
      "CapacityReservationConfig": { 
         "CapacityReservationPreference": "string",
         "MlReservationArns": [ "string" ]
      },
      "InstanceTypes": [ "string" ]
   },
   "InferenceSpecification": { 
      "Framework": "string"
   },
   "ModelSource": { ... },
   "OptimizeModel": boolean,
   "OutputConfig": { 
      "MlflowConfig": { 
         "MlflowExperimentName": "string",
         "MlflowResourceArn": "string",
         "MlflowRunName": "string"
      },
      "ModelPackageGroupIdentifier": "string",
      "S3OutputLocation": "string"
   },
   "PerformanceTarget": { 
      "Constraints": [ 
         { 
            "Metric": "string"
         }
      ]
   },
   "RoleArn": "string",
   "Tags": [ 
      { 
         "Key": "string",
         "Value": "string"
      }
   ]
}

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

AIRecommendationJobName

The name of the AI recommendation job. The name must be unique within your AWS account in the current AWS Region.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: [a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

Required: Yes

AIWorkloadConfigIdentifier

The name or Amazon Resource Name (ARN) of the AI workload configuration to use for this recommendation job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 256.

Pattern: (arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:[a-z\-]*/)?([a-zA-Z0-9]([a-zA-Z0-9\-]){0,62})(?<!-)

Required: Yes

ComputeSpec

The compute resource specification for the recommendation job. You can specify up to 3 instance types to consider, and optionally provide capacity reservation configuration.

Type: AIRecommendationComputeSpec object

Required: No

InferenceSpecification

The inference framework configuration. Specify the framework (such as LMI or vLLM) for the recommendation job.

Type: AIRecommendationInferenceSpecification object

Required: No

ModelSource

The source of the model to optimize. Specify the Amazon S3 location of the model artifacts.

Type: AIModelSource object

Note: This object is a Union. Only one member of this object can be specified or returned.

Required: Yes

OptimizeModel

Whether to allow model optimization techniques such as quantization, speculative decoding, and kernel tuning. The default is true.

Type: Boolean

Required: No

OutputConfig

The output configuration for the recommendation job, including the Amazon S3 location for results and an optional model package group where the optimized model is registered.

Type: AIRecommendationOutputConfig object

Required: Yes

PerformanceTarget

The performance targets for the recommendation job. Specify constraints on metrics such as time to first token (ttft-ms), throughput, or cost.

Type: AIRecommendationPerformanceTarget object

Required: Yes

RoleArn

The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker AI to perform tasks on your behalf.

Type: String

Length Constraints: Minimum length of 20. Maximum length of 2048.

Pattern: arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

Tags

The metadata that you apply to AWS resources to help you categorize and organize them.

Type: Array of Tag objects

Array Members: Minimum number of 0 items. Maximum number of 50 items.

Required: No

Response Syntax


{
   "AIRecommendationJobArn": "string"
}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

AIRecommendationJobArn

The Amazon Resource Name (ARN) of the created recommendation job.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:ai-recommendation-job/[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

Errors

For information about the errors that are common to all actions, see Common Error Types.

ResourceInUse

Resource being accessed is in use.

HTTP Status Code: 400

ResourceLimitExceeded

You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.

HTTP Status Code: 400

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400