CreateInferenceProfile
Creates an application inference profile to track metrics and costs when invoking a model. To create an application inference profile for a foundation model in one region, specify the ARN of the model in that region. To create an application inference profile for a foundation model across multiple regions, specify the ARN of the system-defined inference profile that contains the regions that you want to route requests to. For more information, see Increase throughput and resilience with cross-region inference in Amazon Bedrock. in the Amazon Bedrock User Guide.
Request Syntax
POST /inference-profiles HTTP/1.1
Content-type: application/json
{
"clientRequestToken": "string
",
"description": "string
",
"inferenceProfileName": "string
",
"modelSource": { ... },
"tags": [
{
"key": "string
",
"value": "string
"
}
]
}
URI Request Parameters
The request does not use any URI parameters.
Request Body
The request accepts the following data in JSON format.
- clientRequestToken
-
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 256.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9])*$
Required: No
- description
-
A description for the inference profile.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^([0-9a-zA-Z:.][ _-]?)+$
Required: No
- inferenceProfileName
-
A name for the inference profile.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
^([0-9a-zA-Z][ _-]?)+$
Required: Yes
- modelSource
-
The foundation model or system-defined inference profile that the inference profile will track metrics and costs for.
Type: InferenceProfileModelSource object
Note: This object is a Union. Only one member of this object can be specified or returned.
Required: Yes
-
An array of objects, each of which contains a tag and its value. For more information, see Tagging resources in the Amazon Bedrock User Guide.
Type: Array of Tag objects
Array Members: Minimum number of 0 items. Maximum number of 200 items.
Required: No
Response Syntax
HTTP/1.1 201
Content-type: application/json
{
"inferenceProfileArn": "string",
"status": "string"
}
Response Elements
If the action is successful, the service sends back an HTTP 201 response.
The following data is returned in JSON format by the service.
- inferenceProfileArn
-
The ARN of the inference profile that you created.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 2048.
Pattern:
^arn:aws(|-us-gov|-cn|-iso|-iso-b):bedrock:(|[0-9a-z-]{0,20}):(|[0-9]{12}):(inference-profile|application-inference-profile)/[a-zA-Z0-9-:.]+$
- status
-
The status of the inference profile.
ACTIVE
means that the inference profile is ready to be used.Type: String
Valid Values:
ACTIVE
Errors
For information about the errors that are common to all actions, see Common Errors.
- AccessDeniedException
-
The request is denied because of missing access permissions.
HTTP Status Code: 403
- ConflictException
-
Error occurred because of a conflict while performing an operation.
HTTP Status Code: 400
- InternalServerException
-
An internal server error occurred. Retry your request.
HTTP Status Code: 500
- ResourceNotFoundException
-
The specified resource Amazon Resource Name (ARN) was not found. Check the Amazon Resource Name (ARN) and try your request again.
HTTP Status Code: 404
- ServiceQuotaExceededException
-
The number of requests exceeds the service quota. Resubmit your request later.
HTTP Status Code: 400
- ThrottlingException
-
The number of requests exceeds the limit. Resubmit your request later.
HTTP Status Code: 429
- TooManyTagsException
-
The request contains more tags than can be associated with a resource (50 tags per resource). The maximum number of tags includes both existing tags and those included in your current request.
HTTP Status Code: 400
- ValidationException
-
Input validation failed. Check your request parameters and retry the request.
HTTP Status Code: 400
Examples
Create an application inference profile from a foundation model
Run the following example to create an application inference profile from the Anthropic Claude 3 Sonnet model in your current region:
Sample Request
POST /inference-profiles HTTP/1.1
Content-type: application/json
{
"inferenceProfileName": "USClaudeSonnetApplicationIP",
"modelSource": {
"copyFrom": "anthropic.claude-3-sonnet-20240229-v1:0"
},
"tags": [
{
"key": "projectId",
"value": "abcdef123456"
}
]
}
Create an application inference profile from a cross-region (system-defined) inference profile
Run the following example to create an application inference profile from the US Anthropic Claude 3 Sonnet inference profile:
Sample Request
POST /inference-profiles HTTP/1.1
Content-type: application/json
{
"inferenceProfileName": "USClaudeSonnetApplicationIP",
"modelSource": {
"copyFrom": "us.anthropic.claude-3-sonnet-20240229-v1:0"
},
"tags": [
{
"key": "projectId",
"value": "abcdef123456"
}
]
}
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: