Set up a model invocation resource using inference profiles

Inference profiles are a resource in Amazon Bedrock that define a model and one or more Regions to which the inference profile can route model invocation requests. You can use inference profiles for the following tasks:

Track usage metrics – Set up CloudWatch logs and submit model invocation requests with an application inference profile to collect usage metrics for model invocation. You can examine these metrics when you view information about the inference profile and use them to inform your decisions. For more information about how to set up CloudWatch logs, see Monitor model invocation using CloudWatch Logs.
Use tags to monitor costs – Attach tags to an application inference profile to track costs when you submit on-demand model invocation requests. For more information on how to use tags for cost allocation, see Organizing and tracking costs using AWS cost allocation tags in the AWS Billing user guide.
Cross-region inference – Increase your throughput by using an inference profile that includes multiple AWS Regions. The inference profile will distribute model invocation requests across these regions to increase throughput and performance. For more information about cross-region inference, see Increase throughput with cross-region inference.

Amazon Bedrock offers the following types of inference profiles:

Cross region (system-defined) inference profiles – Inference profiles that are predefined in Amazon Bedrock and include multiple Regions to which requests for a model can be routed.
Application inference profiles – Inference profiles that a user creates to track costs and model usage. You can create an inference profile that routes model invocation requests to one Region or to multiple Regions:
- To create an inference profile that tracks costs and usage for a model in one Region, specify the foundation model in the Region to which you want the inference profile to route requests.
- To create an inference profile that tracks costs and usage for a model across multiple Regions, specify the cross region (system-defined) inference profile that defines the model and Regions to which you want the inference profile to route requests.

You can use inference profiles with the following features to route requests to multiple Regions and to track usage and cost for invocation requests made with these features:

Model inference – Use an inference profile when running model invocation by choosing an inference profile in a playground in the Amazon Bedrock console, or by specifying the ARN of the inference profile when calling the InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream operations. For more information, see Submit prompts and generate responses with model inference.
Knowledge base vector embedding and response generation – Use an inference profile when generating a response after querying a knowledge base or when parsing non-textual information in a data source. For more information, see Test your knowledge base with queries and responses and Parsing options for your data source.
Model evaluation – You can submit an inference profile as a model to evaluate when submitting a model evaluation job. For more information, see Evaluate the performance of Amazon Bedrock resources.
Prompt management – You can use an inference profile when generating a response for a prompt you created in Prompt management. For more information, see Construct and store reusable prompts with Prompt management in Amazon Bedrock
Flows – You can use an inference profile when generating a response for a prompt you define inline in a prompt node in a flow. For more information, see Build an end-to-end generative AI workflow with Amazon Bedrock Flows.

The price for using an inference profile is calculated based on the price of the model in the region from which you call the inference profile. For information about pricing, see Amazon Bedrock pricing.

For more details about the throughput that a cross-region inference profile can offer, see Increase throughput with cross-region inference.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Code examples

Supported Regions and models