Amazon Bedrock endpoints and quotas - AWS General Reference

Amazon Bedrock endpoints and quotas

The following are the service endpoints and service quotas for this service. To connect programmatically to an AWS service, you use an endpoint. In addition to the standard AWS endpoints, some AWS services offer FIPS endpoints in selected Regions. For more information, see AWS service endpoints. Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account. For more information, see AWS service quotas.

Service endpoints

Amazon Bedrock control plane APIs

The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for managing, training, and deploying models. Use these endpoints for Amazon Bedrock API operations.

Region Name Region Endpoint Protocol
US East (Ohio) us-east-2

bedrock.us-east-2.amazonaws.com

bedrock-fips.us-east-2.amazonaws.com

HTTPS

HTTPS

US East (N. Virginia) us-east-1

bedrock.us-east-1.amazonaws.com

bedrock-fips.us-east-1.amazonaws.com

HTTPS

HTTPS

US West (Oregon) us-west-2

bedrock.us-west-2.amazonaws.com

bedrock-fips.us-west-2.amazonaws.com

HTTPS

HTTPS

Asia Pacific (Mumbai) ap-south-1 bedrock.ap-south-1.amazonaws.com HTTPS
Asia Pacific (Seoul) ap-northeast-2 bedrock.ap-northeast-2.amazonaws.com HTTPS
Asia Pacific (Singapore) ap-southeast-1 bedrock.ap-southeast-1.amazonaws.com HTTPS
Asia Pacific (Sydney) ap-southeast-2 bedrock.ap-southeast-2.amazonaws.com HTTPS
Asia Pacific (Tokyo) ap-northeast-1 bedrock.ap-northeast-1.amazonaws.com HTTPS
Canada (Central) ca-central-1

bedrock.ca-central-1.amazonaws.com

bedrock-fips.ca-central-1.amazonaws.com

HTTPS

HTTPS

Europe (Frankfurt) eu-central-1 bedrock.eu-central-1.amazonaws.com HTTPS
Europe (Ireland) eu-west-1 bedrock.eu-west-1.amazonaws.com HTTPS
Europe (London) eu-west-2 bedrock.eu-west-2.amazonaws.com HTTPS
Europe (Paris) eu-west-3 bedrock.eu-west-3.amazonaws.com HTTPS
Europe (Zurich) eu-central-2 bedrock.eu-central-2.amazonaws.com HTTPS
South America (São Paulo) sa-east-1 bedrock.sa-east-1.amazonaws.com HTTPS
AWS GovCloud (US-East) us-gov-east-1

bedrock.us-gov-east-1.amazonaws.com

bedrock-fips.us-gov-east-1.amazonaws.com

HTTPS

HTTPS

AWS GovCloud (US-West) us-gov-west-1

bedrock.us-gov-west-1.amazonaws.com

bedrock-fips.us-gov-west-1.amazonaws.com

HTTPS

HTTPS

Amazon Bedrock runtime APIs

The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for making inference requests for models hosted in Amazon Bedrock. Use these endpoints for Amazon Bedrock Runtime API operations.

Region Name Region Endpoint Protocol
US East (Ohio) us-east-2

bedrock-runtime.us-east-2.amazonaws.com

bedrock-runtime-fips.us-east-2.amazonaws.com

HTTPS

HTTPS

US East (N. Virginia) us-east-1

bedrock-runtime.us-east-1.amazonaws.com

bedrock-runtime-fips.us-east-1.amazonaws.com

HTTPS

HTTPS

US West (Oregon) us-west-2

bedrock-runtime.us-west-2.amazonaws.com

bedrock-runtime-fips.us-west-2.amazonaws.com

HTTPS

HTTPS

Asia Pacific (Mumbai) ap-south-1 bedrock-runtime.ap-south-1.amazonaws.com HTTPS
Asia Pacific (Seoul) ap-northeast-2 bedrock-runtime.ap-northeast-2.amazonaws.com HTTPS
Asia Pacific (Singapore) ap-southeast-1 bedrock-runtime.ap-southeast-1.amazonaws.com HTTPS
Asia Pacific (Sydney) ap-southeast-2 bedrock-runtime.ap-southeast-2.amazonaws.com HTTPS
Asia Pacific (Tokyo) ap-northeast-1 bedrock-runtime.ap-northeast-1.amazonaws.com HTTPS
Canada (Central) ca-central-1

bedrock-runtime.ca-central-1.amazonaws.com

bedrock-runtime-fips.ca-central-1.amazonaws.com

HTTPS

HTTPS

Europe (Frankfurt) eu-central-1 bedrock-runtime.eu-central-1.amazonaws.com HTTPS
Europe (Ireland) eu-west-1 bedrock-runtime.eu-west-1.amazonaws.com HTTPS
Europe (London) eu-west-2 bedrock-runtime.eu-west-2.amazonaws.com HTTPS
Europe (Paris) eu-west-3 bedrock-runtime.eu-west-3.amazonaws.com HTTPS
Europe (Zurich) eu-central-2 bedrock-runtime.eu-central-2.amazonaws.com HTTPS
South America (São Paulo) sa-east-1 bedrock-runtime.sa-east-1.amazonaws.com HTTPS
AWS GovCloud (US-East) us-gov-east-1

bedrock-runtime.us-gov-east-1.amazonaws.com

bedrock-runtime-fips.us-gov-east-1.amazonaws.com

HTTPS

HTTPS

AWS GovCloud (US-West) us-gov-west-1

bedrock-runtime.us-gov-west-1.amazonaws.com

bedrock-runtime-fips.us-gov-west-1.amazonaws.com

HTTPS

HTTPS

Agents for Amazon Bedrock build-time APIs

The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for creating and managing agents and knowledge bases. Use these endpoints for Agents for Amazon Bedrock API operations.

Region Name Region Endpoint Protocol
US East (N. Virginia) us-east-1 bedrock-agent.us-east-1.amazonaws.com HTTPS
bedrock-agent-fips.us-east-1.amazonaws.com HTTPS
US West (Oregon) us-west-2 bedrock-agent.us-west-2.amazonaws.com HTTPS
bedrock-agent-fips.us-west-2.amazonaws.com HTTPS
Asia Pacific (Singapore) ap-southeast-1 bedrock-agent.ap-southeast-1.amazonaws.com HTTPS
Asia Pacific (Sydney) ap-southeast-2 bedrock-agent.ap-southeast-2.amazonaws.com HTTPS
Asia Pacific (Tokyo) ap-northeast-1 bedrock-agent.ap-northeast-1.amazonaws.com HTTPS
Canada (Central) ca-central-1 bedrock-agent.ca-central-1.amazonaws.com HTTPS
Europe (Frankfurt) eu-central-1 bedrock-agent.eu-central-1.amazonaws.com HTTPS
Europe (Ireland) eu-west-1 bedrock-agent.eu-west-1.amazonaws.com HTTPS
Europe (London) eu-west-2 bedrock-agent.eu-west-2.amazonaws.com HTTPS
Europe (Paris) eu-west-3 bedrock-agent.eu-west-3.amazonaws.com HTTPS
Asia Pacific (Mumbai) ap-south-1 bedrock-agent.ap-south-1.amazonaws.com HTTPS
South America (São Paulo) sa-east-1 bedrock-agent.sa-east-1.amazonaws.com HTTPS

Agents for Amazon Bedrock runtime APIs

The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for invoking agents and querying knowledge bases. Use these endpoints for Agents for Amazon Bedrock Runtime API operations.

Region Name Region Endpoint Protocol
US East (N. Virginia) us-east-1 bedrock-agent-runtime.us-east-1.amazonaws.com HTTPS
bedrock-agent-runtime-fips.us-east-1.amazonaws.com HTTPS
US West (Oregon) us-west-2 bedrock-agent-runtime.us-west-2.amazonaws.com HTTPS
bedrock-agent-runtime-fips.us-west-2.amazonaws.com HTTPS
Asia Pacific (Singapore) ap-southeast-1 bedrock-agent-runtime.ap-southeast-1.amazonaws.com HTTPS
Asia Pacific (Sydney) ap-southeast-2 bedrock-agent-runtime.ap-southeast-2.amazonaws.com HTTPS
Asia Pacific (Tokyo) ap-northeast-1 bedrock-agent-runtime.ap-northeast-1.amazonaws.com HTTPS
Canada (Central) ca-central-1 bedrock-agent-runtime.ca-central-1.amazonaws.com HTTPS
Europe (Frankfurt) eu-central-1 bedrock-agent-runtime.eu-central-1.amazonaws.com HTTPS
Europe (Paris) eu-west-3 bedrock-agent-runtime.eu-west-3.amazonaws.com HTTPS
Europe (Ireland) eu-west-1 bedrock-agent-runtime.eu-west-1.amazonaws.com HTTPS
Europe (London) eu-west-2 bedrock-agent-runtime.eu-west-2.amazonaws.com HTTPS
Asia Pacific (Mumbai) ap-south-1 bedrock-agent-runtime.ap-south-1.amazonaws.com HTTPS
South America (São Paulo) sa-east-1 bedrock-agent-runtime.sa-east-1.amazonaws.com HTTPS

Service quotas

For instructions on how to request a quota increase, both for quotas whose Adjustable value is marked as Yes and those marked as No, see Request an increase for Amazon Bedrock quotas. The following table shows a list of quotas for Amazon Bedrock:

Name Default Adjustable Description
APIs per Agent Each supported Region: 11 Yes The maximum number of APIs that you can add to an Agent.
Action groups per Agent Each supported Region: 20 Yes The maximum number of action groups that you can add to an Agent.
Agent nodes per flow Each supported Region: 10 No The maximum number of agent nodes.
Agents per account Each supported Region: 50 Yes The maximum number of Agents in one account.
AssociateAgentKnowledgeBase requests per second Each supported Region: 6 No The maximum number of AssociateAgentKnowledgeBase API requests per second.
Associated aliases per Agent Each supported Region: 10 No The maximum number of aliases that you can associate with an Agent.
Associated knowledge bases per Agent Each supported Region: 2 Yes The maximum number of knowledge bases that you can associate with an Agent.
Batch inference input file size Each supported Region: 1,073,741,824 Yes The maximum size of a single file (in bytes) submitted for batch inference.
Batch inference job size Each supported Region: 5,368,709,120 Yes The maximum cumulative size of all input files (in bytes) included in the batch inference job.
Characters in Agent instructions

us-west-2: 4,000

ap-northeast-1: 4,000

ap-southeast-1: 4,000

ap-southeast-2: 4,000

eu-west-2: 4,000

Each of the other supported Regions: 8,000

Yes The maximum number of characters in the instructions for an Agent.
Collector nodes per flow Each supported Region: 1 No The maximum number of collector nodes.
Concurrent ingestion jobs per account Each supported Region: 5 No The maximum number of ingestion jobs that can be running at the same time in an account.
Concurrent ingestion jobs per data source Each supported Region: 1 No The maximum number of ingestion jobs that can be running at the same time for a data source.
Concurrent ingestion jobs per knowledge base Each supported Region: 1 No The maximum number of ingestion jobs that can be running at the same time for a knowledge base.
Concurrent model import jobs Each supported Region: 1 No The maximum number of model import jobs that are concurrently in progress.
Condition nodes per flow Each supported Region: 5 No The maximum number of condition nodes.
Conditions per condition node Each supported Region: 5 No The maximum number of conditions per condition node.
Contextual grounding query length in text units Each supported Region: 1 No The maximum length, in text units, of the query for contextual grounding
Contextual grounding response length in text units Each supported Region: 5 No The maximum length, in text units, of the response for contextual grounding
Contextual grounding source length in text units

us-east-1: 100

us-west-2: 100

Each of the other supported Regions: 50

No The maximum length, in text units, of the grounding source for contextual grounding
CreateAgent requests per second Each supported Region: 6 No The maximum number of CreateAgent API requests per second.
CreateAgentActionGroup requests per second Each supported Region: 12 No The maximum number of CreateAgentActionGroup API requests per second.
CreateAgentAlias requests per second Each supported Region: 2 No The maximum number of CreateAgentAlias API requests per second.
CreateDataSource requests per second Each supported Region: 2 No The maximum number of CreateDataSource API requests per second.
CreateFlow requests per second Each supported Region: 2 No The maximum number of CreateFlow requests per second.
CreateFlowAlias requests per second Each supported Region: 2 No The maximum number of CreateFlowAlias requests per second.
CreateFlowVersion requests per second Each supported Region: 2 No The maximum number of CreateFlowVersion requests per second.
CreateKnowledgeBase requests per second Each supported Region: 2 No The maximum number of CreateKnowledgeBase API requests per second.
CreatePrompt requests per second Each supported Region: 2 No The maximum number of CreatePrompt requests per second.
CreatePromptVersion requests per second Each supported Region: 2 No The maximum number of CreatePromptVersion requests per second.
Cross-Region InvokeModel requests per minute for Anthropic Claude 3.5 Haiku Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
Cross-Region InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku Each supported Region: 4,000,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
Custom models per account Each supported Region: 100 Yes The maximum number of custom models in an account.
Data sources per knowledge base Each supported Region: 5 No The maximum number of data sources per knowledge base.
DeleteAgent requests per second Each supported Region: 2 No The maximum number of DeleteAgent API requests per second.
DeleteAgentActionGroup requests per second Each supported Region: 2 No The maximum number of DeleteAgentActionGroup API requests per second.
DeleteAgentAlias requests per second Each supported Region: 2 No The maximum number of DeleteAgentAlias API requests per second.
DeleteAgentVersion requests per second Each supported Region: 2 No The maximum number of DeleteAgentVersion API requests per second.
DeleteDataSource requests per second Each supported Region: 2 No The maximum number of DeleteDataSource API requests per second.
DeleteFlow requests per second Each supported Region: 2 No The maximum number of DeleteFlow requests per second.
DeleteFlowAlias requests per second Each supported Region: 2 No The maximum number of DeleteFlowAlias requests per second.
DeleteFlowVersion requests per second Each supported Region: 2 No The maximum number of DeleteFlowVersion requests per second.
DeleteKnowledgeBase requests per second Each supported Region: 2 No The maximum number of DeleteKnowledgeBase API requests per second.
DeletePrompt requests per second Each supported Region: 2 No The maximum number of DeletePrompt requests per second.
DisassociateAgentKnowledgeBase requests per second Each supported Region: 4 No The maximum number of DisassociateAgentKnowledgeBase API requests per second.
Enabled action groups per agent Each supported Region: 11 Yes The maximum number of action groups that you can enable in an Agent.
Endpoints per inference profile Each supported Region: 5 No The maximum number of endpoints in an inference profile. An endpoint is defined by a model and the region that the invocation requests to the model are sent to.
Example phrases per Topic Each supported Region: 5 No The maximum number of topic examples that can be included per topic
Files to add or update per ingestion job Each supported Region: 5,000,000 No The maximum number of new and updated files that can be ingested per ingestion job.
Files to delete per ingestion job Each supported Region: 5,000,000 No The maximum number of files that can be deleted per ingestion job.
Flow aliases per flow Each supported Region: 10 No The maximum number of flow aliases.
Flow versions per flow Each supported Region: 10 No The maximum number of flow versions.
Flows per account Each supported Region: 100 Yes The maximum number of flows per account.
GetAgent requests per second Each supported Region: 15 No The maximum number of GetAgent API requests per second.
GetAgentActionGroup requests per second Each supported Region: 20 No The maximum number of GetAgentActionGroup API requests per second.
GetAgentAlias requests per second Each supported Region: 10 No The maximum number of GetAgentAlias API requests per second.
GetAgentKnowledgeBase requests per second Each supported Region: 15 No The maximum number of GetAgentKnowledgeBase API requests per second.
GetAgentVersion requests per second Each supported Region: 10 No The maximum number of GetAgentVersion API requests per second.
GetDataSource requests per second Each supported Region: 10 No The maximum number of GetDataSource API requests per second.
GetFlow requests per second Each supported Region: 10 No The maximum number of GetFlow requests per second.
GetFlowAlias requests per second Each supported Region: 10 No The maximum number of GetFlowAlias requests per second.
GetFlowVersion requests per second Each supported Region: 10 No The maximum number of GetFlowVersion requests per second.
GetIngestionJob requests per second Each supported Region: 10 No The maximum number of GetIngestionJob API requests per second.
GetKnowledgeBase requests per second Each supported Region: 10 No The maximum number of GetKnowledgeBase API requests per second.
GetPrompt requests per second Each supported Region: 10 No The maximum number of GetPrompt requests per second.
Guardrails per account Each supported Region: 100 No The maximum number of guardrails in an account
Imported models per account Each supported Region: 3 Yes The maximum number of imported models in an account.
Inference profiles per account Each supported Region: 1,000 Yes The maximum number of inference profiles in an account.
Ingestion job file size Each supported Region: 50 No The maximum size (in MB) of a file in an ingestion job.
Ingestion job size Each supported Region: 100 No The maximum size (in GB) of an ingestion job.
Input nodes per flow Each supported Region: 1 No The maximum number of flow input nodes.
Iterator nodes per flow Each supported Region: 1 No The maximum number of iterator nodes.
Knowledge base nodes per flow Each supported Region: 10 No The maximum number of knowledge base nodes.
Knowledge bases per account Each supported Region: 100 No The maximum number of knowledge bases per account.
Lambda function nodes per flow Each supported Region: 10 No The maximum number of Lambda function nodes.
Lex nodes per flow Each supported Region: 5 No The maximum number of Lex nodes.
ListAgentActionGroups requests per second Each supported Region: 10 No The maximum number of ListAgentActionGroups API requests per second.
ListAgentAliases requests per second Each supported Region: 10 No The maximum number of ListAgentAliases API requests per second.
ListAgentKnowledgeBases requests per second Each supported Region: 10 No The maximum number of ListAgentKnowledgeBases API requests per second.
ListAgentVersions requests per second Each supported Region: 10 No The maximum number of ListAgentVersions API requests per second.
ListAgents requests per second Each supported Region: 10 No The maximum number of ListAgents API requests per second.
ListDataSources requests per second Each supported Region: 10 No The maximum number of ListDataSources API requests per second.
ListFlowAliases requests per second Each supported Region: 10 No The maximum number of ListFlowAliases requests per second.
ListFlowVersions requests per second Each supported Region: 10 No The maximum number of ListFlowVersions requests per second.
ListFlows requests per second Each supported Region: 10 No The maximum number of ListFlows requests per second.
ListIngestionJobs requests per second Each supported Region: 10 No The maximum number of ListIngestionJobs API requests per second.
ListKnowledgeBases requests per second Each supported Region: 10 No The maximum number of ListKnowledgeBases API requests per second.
ListPrompts requests per second Each supported Region: 10 No The maximum number of ListPrompts requests per second.
Model units no-commitment Provisioned Throughputs across base models Each supported Region: 2 Yes The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for base models
Model units no-commitment Provisioned Throughputs across custom models Each supported Region: 2 Yes The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for custom models
Model units per provisioned model for AI21 Labs Jurassic-2 Mid Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Mid.
Model units per provisioned model for AI21 Labs Jurassic-2 Ultra Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Ultra.
Model units per provisioned model for Amazon Titan Embeddings G1 - Text Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Embeddings G1 - Text.
Model units per provisioned model for Amazon Titan Image Generator G1 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G1.
Model units per provisioned model for Amazon Titan Image Generator G2 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G2.
Model units per provisioned model for Amazon Titan Lite V1 4K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Lite V1 4K.
Model units per provisioned model for Amazon Titan Multimodal Embeddings G1 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Multimodal Embeddings G1.
Model units per provisioned model for Amazon Titan Text Embeddings V2 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Embeddings V2.
Model units per provisioned model for Amazon Titan Text G1 - Express 8K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text G1 - Express 8K.
Model units per provisioned model for Amazon Titan Text Premier V1 32K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Premier V1 32K.
Model units per provisioned model for Anthropic Claude 3 Haiku 200K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 200K.
Model units per provisioned model for Anthropic Claude 3 Haiku 48K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 48K.
Model units per provisioned model for Anthropic Claude 3 Sonnet 200K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 200K.
Model units per provisioned model for Anthropic Claude 3 Sonnet 28K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 28K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 18K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 18K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 200K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 200K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 51K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 51K.
Model units per provisioned model for Anthropic Claude Instant V1 100K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude Instant V1 100K.
Model units per provisioned model for Anthropic Claude V2 100K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 100K.
Model units per provisioned model for Anthropic Claude V2 18K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 18K.
Model units per provisioned model for Anthropic Claude V2.1 18K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 18K.
Model units per provisioned model for Anthropic Claude V2.1 200K Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 200k.
Model units per provisioned model for Cohere Command Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Command.
Model units per provisioned model for Cohere Command Light Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Command Light.
Model units per provisioned model for Cohere Command R Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Command R 128k.
Model units per provisioned model for Cohere Command R Plus Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Command R Plus 128k.
Model units per provisioned model for Cohere Embed English Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Embed English.
Model units per provisioned model for Cohere Embed Multilingual Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Cohere Embed Multilingual.
Model units per provisioned model for Meta Llama 2 13B Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 13B.
Model units per provisioned model for Meta Llama 2 70B Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 70B.
Model units per provisioned model for Meta Llama 2 Chat 13B Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 13B.
Model units per provisioned model for Meta Llama 2 Chat 70B Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 70B.
Model units per provisioned model for Meta Llama 3 70B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 70B Instruct.
Model units per provisioned model for Meta Llama 3 8B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 8B Instruct.
Model units per provisioned model for Meta Llama 3.1 70B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 70B Instruct.
Model units per provisioned model for Meta Llama 3.1 8B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 8B Instruct.
Model units per provisioned model for Meta Llama 3.2 1B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 1B Instruct.
Model units per provisioned model for Meta Llama 3.2 3B Instruct Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 3B Instruct.
Model units per provisioned model for Mistral Large 2407 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Mistral Large 2407.
Model units per provisioned model for Mistral Small Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Mistral Small.
Model units per provisioned model for Stability.ai Stable Diffusion XL 0.8 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 0.8
Model units per provisioned model for Stability.ai Stable Diffusion XL 1.0 Each supported Region: 0 Yes The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 1.0.
Number of concurrent automatic model evaluation jobs Each supported Region: 20 No The maximum number of automatic model evaluation jobs that you can specify at one time in this account in the current Region.
Number of concurrent model evaluation jobs that use human workers Each supported Region: 10 No The maximum number of model evaluation jobs that use human workers you can specify at one time in this account in the current Region.
Number of custom metrics Each supported Region: 10 No The maximum number of custom metrics that you can specify in a model evaluation job that uses human workers.
Number of custom prompt datasets in a human-based model evaluation job Each supported Region: 1 No The maximum number of custom prompt datasets that you can specify in a human-based model evaluation job in this account in the current Region.
Number of datasets per job Each supported Region: 5 No The maximum number of datasets that you can specify in an automated model evaluation job. This includes both custom and built-in prompt datasets.
Number of evaluation jobs Each supported Region: 5,000 No The maximum number of model evaluation jobs that you can create in this account in the current Region.
Number of metrics per dataset Each supported Region: 3 No The maximum number of metrics that you can specify per dataset in an automated model evaluation job. This includes both custom and built-in metrics.
Number of models in a model evaluation job that uses human workers Each supported Region: 2 No The maximum number of models that you can specify in a model evaluation job that uses human workers.
Number of models in automated model evaluation job Each supported Region: 1 No The maximum number of models that you can specify in an automated model evaluation job.
Number of prompts in a custom prompt dataset Each supported Region: 1,000 No The maximum number of prompts a custom prompt dataset can contains.
On-demand ApplyGuardrail Content filter policy text units per second Each supported Region: 25 No The maximum number of text units that can be processed for Content filter policies per second
On-demand ApplyGuardrail Denied topic policy text units per second Each supported Region: 25 No The maximum number of text units that can be processed for Denied topic policies per second
On-demand ApplyGuardrail Sensitive information filter policy text units per second Each supported Region: 25 No The maximum number of text units that can be processed for Sensitive information filter policies per second
On-demand ApplyGuardrail Word filter policy text units per second Each supported Region: 25 No The maximum number of text units that can be processed for Word filter policies per second.
On-demand ApplyGuardrail contextual grounding policy text units per second

us-east-1: 106

us-west-2: 106

Each of the other supported Regions: 53

No The maximum number of text units that can be processed for contextual grounding policies per second
On-demand ApplyGuardrail requests per second Each supported Region: 25 No The maximum number of ApplyGuardrail API calls allowed per second
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Large Each supported Region: 100 No The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Mini Each supported Region: 100 No The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jamba Instruct Each supported Region: 100 No The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Mid Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Mid
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Ultra Each supported Region: 100 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Ultra
On-demand InvokeModel requests per minute for Amazon Titan Image Generator G1 Each supported Region: 60 No The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Image Generator G1.
On-demand InvokeModel requests per minute for Amazon Titan Image Generator G1 V2 Each supported Region: 60 No The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Image Generator G V2.
On-demand InvokeModel requests per minute for Amazon Titan Multimodal Embeddings G1 Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1.
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings V2 Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings V2
On-demand InvokeModel requests per minute for Amazon Titan Text Express Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Express
On-demand InvokeModel requests per minute for Amazon Titan Text Lite Each supported Region: 800 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Lite
On-demand InvokeModel requests per minute for Amazon Titan Text Premier Each supported Region: 100 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Premier
On-demand InvokeModel requests per minute for Anthropic Claude 3 Haiku

us-east-1: 1,000

us-west-2: 1,000

ap-northeast-1: 200

ap-southeast-1: 200

Each of the other supported Regions: 400

No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
On-demand InvokeModel requests per minute for Anthropic Claude 3 Opus Each supported Region: 50 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Opus.
On-demand InvokeModel requests per minute for Anthropic Claude 3 Sonnet

us-east-1: 500

us-west-2: 500

Each of the other supported Regions: 100

No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Sonnet.
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Haiku Each supported Region: 1,000 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet

us-east-1: 50

us-east-2: 50

us-west-2: 250

ap-northeast-2: 50

ap-south-1: 50

ap-southeast-2: 50

Each of the other supported Regions: 20

No The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet V2

us-west-2: 250

Each of the other supported Regions: 50

No The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel requests per minute for Anthropic Claude Instant

us-east-1: 1,000

us-west-2: 1,000

Each of the other supported Regions: 400

No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude Instant
On-demand InvokeModel requests per minute for Anthropic Claude V2

us-east-1: 500

us-west-2: 500

Each of the other supported Regions: 100

No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude V2
On-demand InvokeModel requests per minute for Cohere Command Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command.
On-demand InvokeModel requests per minute for Cohere Command Light Each supported Region: 800 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command Light.
On-demand InvokeModel requests per minute for Cohere Command R Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R 128k.
On-demand InvokeModel requests per minute for Cohere Command R Plus Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R Plus 128k.
On-demand InvokeModel requests per minute for Cohere Embed English Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel in one minute for Cohere Embed English.
On-demand InvokeModel requests per minute for Cohere Embed Multilingual Each supported Region: 2,000 No The maximum number of times that you can call InvokeModel in one minute for Cohere Embed Multilingual.
On-demand InvokeModel requests per minute for Meta Llama 2 13B Each supported Region: 800 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 13B.
On-demand InvokeModel requests per minute for Meta Llama 2 70B Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 70B.
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 13B Each supported Region: 800 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 13B.
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 70B Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 70B.
On-demand InvokeModel requests per minute for Meta Llama 3 70B Instruct Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 3 70B Instruct.
On-demand InvokeModel requests per minute for Meta Llama 3 8B Instruct Each supported Region: 800 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 3 8B Instruct.
On-demand InvokeModel requests per minute for Mistral 7B Instruct Each supported Region: 800 No The maximum number of times that you can call InvokeModel in one minute for Mistral mistral-7b-instruct-v0
On-demand InvokeModel requests per minute for Mistral AI Mistral Small Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral AI Mistral Small
On-demand InvokeModel requests per minute for Mistral Large Each supported Region: 400 No The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral mistral-large-2402-v1
On-demand InvokeModel requests per minute for Mistral Mixtral 8x7b Instruct Each supported Region: 400 No The maximum number of times that you can call InvokeModel in one minute for Mistral mixtral-8x7b-v0
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Large Each supported Region: 15 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Large.
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Medium Each supported Region: 60 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Medium
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 0.8 Each supported Region: 60 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 0.8
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 1.0 Each supported Region: 60 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 1.0
On-demand InvokeModel requests per minute for Stability.ai Stable Image Core Each supported Region: 90 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Core.
On-demand InvokeModel requests per minute for Stability.ai Stable Image Ultra Each supported Region: 10 No The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Ultra.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Large Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of tokens for Converse and InvokeModel.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Mini Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of tokens for Converse and InvokeModel.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of tokens for Converse and InvokeModel
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Mid Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Mid.
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Ultra Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Ultra.
On-demand InvokeModel tokens per minute for Amazon Titan Image Generator G1 Each supported Region: 2,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Image Generator G1.
On-demand InvokeModel tokens per minute for Amazon Titan Image Generator G1 V2 Each supported Region: 2,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Image Generator G1 V2.
On-demand InvokeModel tokens per minute for Amazon Titan Multimodal Embeddings G1 Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1.
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings.
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings V2 Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings V2.
On-demand InvokeModel tokens per minute for Amazon Titan Text Express Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Express.
On-demand InvokeModel tokens per minute for Amazon Titan Text Lite Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Lite.
On-demand InvokeModel tokens per minute for Amazon Titan Text Premier Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Premier.
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Haiku

us-east-1: 2,000,000

us-west-2: 2,000,000

ap-northeast-1: 200,000

ap-southeast-1: 200,000

Each of the other supported Regions: 300,000

No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Opus Each supported Region: 400,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Opus.
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Sonnet

us-east-1: 1,000,000

us-west-2: 1,000,000

Each of the other supported Regions: 200,000

No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Sonnet.
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku Each supported Region: 2,000,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet

us-east-1: 400,000

us-east-2: 400,000

us-west-2: 2,000,000

ap-northeast-2: 400,000

ap-south-1: 400,000

ap-southeast-2: 400,000

Each of the other supported Regions: 200,000

No The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet V2

us-west-2: 2,000,000

Each of the other supported Regions: 400,000

No The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel tokens per minute for Anthropic Claude Instant

us-east-1: 1,000,000

us-west-2: 1,000,000

Each of the other supported Regions: 300,000

No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude Instant.
On-demand InvokeModel tokens per minute for Anthropic Claude V2

us-east-1: 500,000

us-west-2: 500,000

Each of the other supported Regions: 200,000

No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude V2.
On-demand InvokeModel tokens per minute for Cohere Command Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command.
On-demand InvokeModel tokens per minute for Cohere Command Light Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Command Light.
On-demand InvokeModel tokens per minute for Cohere Command R Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R 128k.
On-demand InvokeModel tokens per minute for Cohere Command R Plus Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R Plus 128k.
On-demand InvokeModel tokens per minute for Cohere Embed English Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed English.
On-demand InvokeModel tokens per minute for Cohere Embed Multilingual Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed Multilingual.
On-demand InvokeModel tokens per minute for Meta Llama 2 13B Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 13B.
On-demand InvokeModel tokens per minute for Meta Llama 2 70B Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 70B.
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 13B Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 13B.
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 70B Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 70B.
On-demand InvokeModel tokens per minute for Meta Llama 3 70B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 3 70B Instruct.
On-demand InvokeModel tokens per minute for Meta Llama 3 8B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 3 8B Instruct.
On-demand InvokeModel tokens per minute for Mistral AI Mistral 7B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral 7B Instruct.
On-demand InvokeModel tokens per minute for Mistral AI Mistral Large Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Large.
On-demand InvokeModel tokens per minute for Mistral AI Mistral Small Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Small.
On-demand InvokeModel tokens per minute for Mistral AI Mixtral 8X7BB Instruct Each supported Region: 300,000 No The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral mixtral-8x7b-instruct-v0.
On-demand model inference requests per minute for Meta Llama 3.1 405B Instruct Each supported Region: 200 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.1 70B Instruct Each supported Region: 400 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.1 8B Instruct Each supported Region: 800 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 11B Instruct Each supported Region: 400 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 1B Instruct Each supported Region: 800 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 3B Instruct Each supported Region: 800 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 90B Instruct Each supported Region: 400 No The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Mistral Large 2407 Each supported Region: 400 No The maximum number of times that you can call model inference in one minute for Mistral Large 2407. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream
On-demand model inference tokens per minute for Meta Llama 3.1 8B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 11B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 1B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 3B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 90B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Mistral Large 2407 Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Mistral Large 2407. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream
On-demand model inference tokens per minute for for Meta Llama 3.1 405B Instruct Each supported Region: 400,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for for Meta Llama 3.1 70B Instruct Each supported Region: 300,000 No The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
Output nodes per flow Each supported Region: 10 No The maximum number of flow output nodes.
Parameters per function Each supported Region: 5 Yes The maximum number of parameters that you can have in an action group function.
PrepareAgent requests per second Each supported Region: 2 No The maximum number of PrepareAgent API requests per second.
PrepareFlow requests per second Each supported Region: 2 No The maximum number of PrepareFlow requests per second.
Prompt nodes per flow Each supported Region: 10 Yes The maximum number of prompt nodes.
Prompts per account Each supported Region: 50 Yes The maximum number of prompts.
Records per batch inference job Each supported Region: 50,000 Yes The maximum number of records across all input files in a batch inference job.
Records per input file per batch inference job Each supported Region: 50,000 Yes The maximum number of records in an input file in a batch inference job.
Regex entities in Sensitive Information Filter Each supported Region: 10 No The maximum number of guardrail filter regexes that can be included in a word policy
Regex length in characters Each supported Region: 500 No The maximum length, in characters, of a guardrail filter regex
Retrieve requests per second Each supported Region: 5 No The maximum number of Retrieve API requests per second.
RetrieveAndGenerate requests per second Each supported Region: 5 No The maximum number of RetrieveAndGenerate API requests per second.
S3 retrieval nodes per flow Each supported Region: 10 No The maximum number of S3 retrieval nodes.
S3 storage nodes per flow Each supported Region: 10 No The maximum number of S3 storage nodes.
Scheduled customization jobs Each supported Region: 2 No The maximum number of scheduled customization jobs.
Size of prompt Each supported Region: 4 No The maximum size (in KB) of an individual prompt is a custom prompt dataset.
StartIngestionJob requests per second Each supported Region: 0.1 No The maximum number of StartIngestionJob API requests per second.
Sum of in-progress and submitted batch inference jobs using a base model

eu-south-1: 10

Each of the other supported Regions: 20

Yes The maximum number of in-progress and submitted batch inference jobs using a base model.
Sum of in-progress and submitted batch inference jobs using a custom model Each supported Region: 3 Yes The maximum number of in-progress and submitted batch inference jobs using a custom model
Sum of training and validation records for a Claude 3 Haiku v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Claude 3 Haiku Fine-tuning job.
Sum of training and validation records for a Meta Llama 2 13B v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 2 13B Fine-tuning job.
Sum of training and validation records for a Meta Llama 2 70B v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 2 70B Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.1 70B Instruct v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 3.1 70B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.1 8B Instruct v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 3.1 8B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 1B Instruct v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 3.2 1B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 3B Instruct v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Meta Llama 3.2 3B Instruct Fine-tuning job.
Sum of training and validation records for a Titan Image Generator G1 V1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Titan Image Generator Fine-tuning job.
Sum of training and validation records for a Titan Image Generator G1 V2 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Titan Image Generator V2 Fine-tuning job.
Sum of training and validation records for a Titan Multimodal Embeddings G1 v1 Fine-tuning job Each supported Region: 50,000 Yes The maximum combined number of training and validation records allowed for a Titan Multimodal Embeddings Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Express v1 Continued Pre-Training job Each supported Region: 100,000 Yes The maximum combined number of training and validation records allowed for a Titan Text Express Continued Pre-Training job.
Sum of training and validation records for a Titan Text G1 - Express v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Titan Text Express Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Lite v1 Continued Pre-Training job Each supported Region: 100,000 Yes The maximum combined number of training and validation records allowed for a Titan Text Lite Continued Pre-Training job.
Sum of training and validation records for a Titan Text G1 - Lite v1 Fine-tuning job Each supported Region: 10,000 Yes The maximum combined number of training and validation records allowed for a Titan Text Lite Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Premier v1 Fine-tuning job Each supported Region: 20,000 Yes The maximum combined number of training and validation records allowed for a Titan Text Premier Fine-tuning job.
Task time for workers Each supported Region: 30 No The maximum length (in days) of time that a worker can have to complete tasks.
Topics per guardrail Each supported Region: 30 No The maximum number of topics that can be defined across guardrail topic policies
Total nodes per flow Each supported Region: 40 No The maximum number of nodes in a flow.
UpdateAgent requests per second Each supported Region: 4 No The maximum number of UpdateAgent API requests per second.
UpdateAgentActionGroup requests per second Each supported Region: 6 No The maximum number of UpdateAgentActionGroup API requests per second
UpdateAgentAlias requests per second Each supported Region: 2 No The maximum number of UpdateAgentAlias API requests per second.
UpdateAgentKnowledgeBase requests per second Each supported Region: 4 No The maximum number of UpdateAgentKnowledgeBase API requests per second.
UpdateDataSource requests per second Each supported Region: 2 No The maximum number of UpdateDataSource API requests per second.
UpdateFlow requests per second Each supported Region: 2 No The maximum number of UpdateFlow requests per second.
UpdateFlowAlias requests per second Each supported Region: 2 No The maximum number of UpdateFlowAlias requests per second.
UpdateKnowledgeBase requests per second Each supported Region: 2 No The maximum number of UpdateKnowledgeBase API requests per second.
UpdatePrompt requests per second Each supported Region: 2 No The maximum number of UpdatePrompt requests per second.
User query size Each supported Region: 1,000 No The maximum size (in characters) of a user query.
ValidateFlowDefinition requests per second Each supported Region: 2 No The maximum number of ValidateFlowDefinition requests per second.
Versions per guardrail Each supported Region: 20 No The maximum number of versions that a guardrail can have
Versions per prompt Each supported Region: 10 No The maximum number of versions per prompt.
Word length in characters Each supported Region: 100 No The maximum length of a word, in characters, in a blocked word list
Words per word policy Each supported Region: 10,000 No The maximum number of words that can be included in a blocked word list