Request Syntax URI Request Parameters Request Body Response Syntax Response Elements Errors Examples See Also

CreateInferenceProfile

Creates an application inference profile to track metrics and costs when invoking a model. To create an application inference profile for a foundation model in one region, specify the ARN of the model in that region. To create an application inference profile for a foundation model across multiple regions, specify the ARN of the system-defined inference profile that contains the regions that you want to route requests to. For more information, see Increase throughput and resilience with cross-region inference in Amazon Bedrock. in the Amazon Bedrock User Guide.

Request Syntax


POST /inference-profiles HTTP/1.1
Content-type: application/json

{
   "clientRequestToken": "string",
   "description": "string",
   "inferenceProfileName": "string",
   "modelSource": { ... },
   "tags": [ 
      { 
         "key": "string",
         "value": "string"
      }
   ]
}

URI Request Parameters

The request does not use any URI parameters.

Request Body

The request accepts the following data in JSON format.

clientRequestToken

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 256.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*$

Required: No

description

A description for the inference profile.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 200.

Pattern: ^([0-9a-zA-Z:.][ _-]?)+$

Required: No

inferenceProfileName

A name for the inference profile.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 64.

Pattern: ^([0-9a-zA-Z][ _-]?)+$

Required: Yes

modelSource

The foundation model or system-defined inference profile that the inference profile will track metrics and costs for.

Type: InferenceProfileModelSource object

Note: This object is a Union. Only one member of this object can be specified or returned.

Required: Yes

tags

An array of objects, each of which contains a tag and its value. For more information, see Tagging resources in the Amazon Bedrock User Guide.

Type: Array of Tag objects

Array Members: Minimum number of 0 items. Maximum number of 200 items.

Required: No

Response Syntax


HTTP/1.1 201
Content-type: application/json

{
   "inferenceProfileArn": "string",
   "status": "string"
}

Response Elements

If the action is successful, the service sends back an HTTP 201 response.

The following data is returned in JSON format by the service.

inferenceProfileArn

The ARN of the inference profile that you created.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: ^arn:aws(|-us-gov|-cn|-iso|-iso-b):bedrock:(|[0-9a-z-]{0,20}):(|[0-9]{12}):(inference-profile|application-inference-profile)/[a-zA-Z0-9-:.]+$

status

The status of the inference profile. ACTIVE means that the inference profile is ready to be used.

Type: String

Valid Values: ACTIVE

Errors

For information about the errors that are common to all actions, see Common Errors.

AccessDeniedException

The request is denied because of missing access permissions.

HTTP Status Code: 403

ConflictException

Error occurred because of a conflict while performing an operation.

HTTP Status Code: 400

InternalServerException

An internal server error occurred. Retry your request.

HTTP Status Code: 500

ResourceNotFoundException

The specified resource Amazon Resource Name (ARN) was not found. Check the Amazon Resource Name (ARN) and try your request again.

HTTP Status Code: 404

ServiceQuotaExceededException

The number of requests exceeds the service quota. Resubmit your request later.

HTTP Status Code: 400

ThrottlingException

The number of requests exceeds the limit. Resubmit your request later.

HTTP Status Code: 429

TooManyTagsException

The request contains more tags than can be associated with a resource (50 tags per resource). The maximum number of tags includes both existing tags and those included in your current request.

HTTP Status Code: 400

ValidationException

Input validation failed. Check your request parameters and retry the request.

HTTP Status Code: 400

Examples

Create an application inference profile from a foundation model

Run the following example to create an application inference profile from the Anthropic Claude 3 Sonnet model in your current region:

Sample Request


POST /inference-profiles HTTP/1.1
Content-type: application/json

{
   "inferenceProfileName": "USClaudeSonnetApplicationIP",
   "modelSource": {
      "copyFrom": "anthropic.claude-3-sonnet-20240229-v1:0"
   },
   "tags": [ 
      { 
         "key": "projectId",
         "value": "abcdef123456" 
      } 
   ]
}

Create an application inference profile from a cross-region (system-defined) inference profile

Run the following example to create an application inference profile from the US Anthropic Claude 3 Sonnet inference profile:

Sample Request


POST /inference-profiles HTTP/1.1
Content-type: application/json

{
   "inferenceProfileName": "USClaudeSonnetApplicationIP",
   "modelSource": {
      "copyFrom": "us.anthropic.claude-3-sonnet-20240229-v1:0"
   },
   "tags": [ 
      { 
         "key": "projectId",
         "value": "abcdef123456" 
      } 
   ]
}