ListInferenceRecommendationsJobSteps
Returns a list of the subtasks for an Inference Recommender job.
The supported subtasks are benchmarks, which evaluate the performance of your model on different instance types.
Request Syntax
{
"JobName": "string
",
"MaxResults": number
,
"NextToken": "string
",
"Status": "string
",
"StepType": "string
"
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- JobName
-
The name for the Inference Recommender job.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}
Required: Yes
- MaxResults
-
The maximum number of results to return.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 100.
Required: No
- NextToken
-
A token that you can specify to return more results from the list. Specify this field if you have a token that was returned from a previous request.
Type: String
Length Constraints: Maximum length of 8192.
Pattern:
.*
Required: No
- Status
-
A filter to return benchmarks of a specified status. If this field is left empty, then all benchmarks are returned.
Type: String
Valid Values:
PENDING | IN_PROGRESS | COMPLETED | FAILED | STOPPING | STOPPED | DELETING | DELETED
Required: No
- StepType
-
A filter to return details about the specified type of subtask.
BENCHMARK
: Evaluate the performance of your model on different instance types.Type: String
Valid Values:
BENCHMARK
Required: No
Response Syntax
{
"NextToken": "string",
"Steps": [
{
"InferenceBenchmark": {
"EndpointConfiguration": {
"EndpointName": "string",
"InitialInstanceCount": number,
"InstanceType": "string",
"ServerlessConfig": {
"MaxConcurrency": number,
"MemorySizeInMB": number,
"ProvisionedConcurrency": number
},
"VariantName": "string"
},
"EndpointMetrics": {
"MaxInvocations": number,
"ModelLatency": number
},
"FailureReason": "string",
"InvocationEndTime": number,
"InvocationStartTime": number,
"Metrics": {
"CostPerHour": number,
"CostPerInference": number,
"CpuUtilization": number,
"MaxInvocations": number,
"MemoryUtilization": number,
"ModelLatency": number,
"ModelSetupTime": number
},
"ModelConfiguration": {
"CompilationJobName": "string",
"EnvironmentParameters": [
{
"Key": "string",
"Value": "string",
"ValueType": "string"
}
],
"InferenceSpecificationName": "string"
}
},
"JobName": "string",
"Status": "string",
"StepType": "string"
}
]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- NextToken
-
A token that you can specify in your next request to return more results from the list.
Type: String
Length Constraints: Maximum length of 8192.
Pattern:
.*
- Steps
-
A list of all subtask details in Inference Recommender.
Type: Array of InferenceRecommendationsJobStep objects
Errors
For information about the errors that are common to all actions, see Common Errors.
- ResourceNotFound
-
Resource being access is not found.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: