Set up a model invocation resource using inference profiles - Amazon Bedrock

Set up a model invocation resource using inference profiles

Inference profiles are a resource in Amazon Bedrock that define a model and one or more Regions to which the inference profile can route model invocation requests. You can use inference profiles for the following tasks:

  • Track usage metrics – Set up CloudWatch logs and submit model invocation requests with an application inference profile to collect usage metrics for model invocation. You can examine these metrics when you view information about the inference profile and use them to inform your decisions. For more information about how to set up CloudWatch logs, see Monitor model invocation using CloudWatch Logs.

  • Use tags to monitor costs – Attach tags to an application inference profile to track costs when you submit on-demand model invocation requests. For more information on how to use tags for cost allocation, see Organizing and tracking costs using AWS cost allocation tags in the AWS Billing user guide.

  • Cross-region inference – Increase your throughput by using an inference profile that includes multiple AWS Regions. The inference profile will distribute model invocation requests across these regions to enhance resilience during peak utilization bursts. For more information about cross-region inference, see Improve resilience with cross-region inference.

Amazon Bedrock offers the following types of inference profiles:

  • Cross region (system-defined) inference profiles – Inference profiles that are predefined in Amazon Bedrock and include multiple Regions to which requests for a model can be routed. You can do the following with a cross region inference profile:

    • To invoke a model across multiple Regions, specify the ID or Amazon Resource Name (ARN) of a cross region inference profile when sending a model invocation request.

  • Application inference profiles – Inference profiles that a user creates to track costs and model usage. You can create an inference profile that routes model invocation requests to one Region or to multiple Regions:

    • To create an inference profile that tracks costs and usage for a model in one Region, specify the foundation model in the Region to which you want the inference profile to route requests.

    • To create an inference profile that tracks costs and usage for a model across multiple Regions, specify the cross region (system-defined) inference profile that defines the model and Regions to which you want the inference profile to route requests.

You can use inference profiles with the following features to route requests to multiple Regions and to track usage and cost for invocation requests made with these features:

The price for using an inference profile is calculated based on the price of the model in the region from which you call the inference profile. For information about pricing, see Amazon Bedrock pricing.

For more details about the throughput that a cross-region inference profile can offer, see Improve resilience with cross-region inference.