Increase throughput for resiliency and processing power - Amazon Bedrock

Increase throughput for resiliency and processing power

Throughput is defined by the number and rate of inputs and outputs that a model processes and returns. When you invoke a model in Amazon Bedrock or use a resource in Amazon Bedrock that invokes a model, the throughput of the model is subject to quotas. Quotas depend on the model and the Region and include the following values:

Amazon Bedrock offers the following types of throughput:

  • On-demand throughput – The standard option for throughput. Involves invoking a model in a specific AWS Region. The quotas are defined in Amazon Bedrock endpoints and quotas in the AWS General Reference.

  • On-demand cross-region inference – Involves invoking an inference profile, which is an abstraction over an on-demand pool of resources from configured AWS Regions. An inference profile can route your inference request originating from your source region to another region configured in the pool. Use of cross-region inference increases throughput and improves resiliency by dynamically routing model invocation requests across the regions defined in the inference profile. Routing factors in user traffic, demand and utilization of resources. For more information, see Improve resilience with cross-region inference.

  • Provisioned Throughput – Involves purchasing a dedicated level of throughput for a model in a specific AWS Region. Provisioned Throughput quotas depend on the number of model units that you purchase. For more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock.

Select a topic to learn more about the options you have for increasing your throughput: