Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Increase throughput with cross-region inference

Focus mode
Increase throughput with cross-region inference - Amazon Bedrock

When running model inference in on-demand mode, your requests might be restricted by service quotas or during peak usage times. Cross-region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions. With cross-region inference, you can distribute traffic across multiple AWS Regions, enabling higher throughput.

You can also increase throughput for a model by purchasing Provisioned Throughput. Inference profiles currently don't support Provisioned Throughput.

To see the Regions and models with which you can use inference profiles to run cross-region inference, refer to Supported Regions and models for inference profiles.

Cross-region (system-defined) inference profiles are named after the model that they support and defined by the Regions that they support. To understand how a cross-region inference profile handles your requests, review the following definitions:

  • Source Region – The Region from which you make the API request that specifies the inference profile.

  • Destination Region – A Region to which the Amazon Bedrock service can route the request from your source Region.

You invoke a cross-region inference profile from a source Region and the Amazon Bedrock service routes your request to any of the destination Regions defined in the inference profile.

Note

Some inference profiles route to different destination Regions depending on the source Region from which you call it. For example, if you call us.anthropic.claude-3-haiku-20240307-v1:0 from US East (Ohio), it can route requests to us-east-1, us-east-2, or us-west-2, but if you call it from US West (Oregon), it can route requests to only us-east-1 and us-west-2.

To check the source and destination Regions for an inference profile, you can do one of the following:

Note

Inference profiles are immutable, meaning that we don't add new Regions to an existing inference profile. However, we might create new inference profiles that incorporate new Regions. You can update your systems to use these inference profiles by changing the IDs in your setup to the new ones.

Note the following information about cross-region inference:

  • There's no additional routing cost for using cross-region inference. The price is calculated based on the region from which you call an inference profile. For information about pricing, see Amazon Bedrock pricing.

  • When using cross-region inference, your throughput can reach up to double the default quotas in the region that the inference profile is in. The increase in throughput only applies to invocation performed via inference profiles, the regular quota still applies if you opt for in-region model invocation request. For example, if you invoke the US Anthropic Claude 3 Sonnet inference profile in us-east-1, your throughput can reach up to 1,000 requests per minute and 2,000,000 tokens per minute. To see the default quotas for on-demand throughput, refer to the Runtime quotas section in Quotas for Amazon Bedrock or use the Service Quotas console.

  • Cross-region inference requests are kept within the regions that are part of the inference profile that was used. For example, a request made with an EU inference profile is kept within EU regions.

Use a cross-region (system-defined) inference profile

To use cross-region inference, you include an inference profile when running model inference in the following ways:

To learn how to use an inference profile to send model invocation requests across Regions, see Use an inference profile in model invocation.

To learn more about cross-region inference, see Getting started with cross-region inference in Amazon Bedrock.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.