Amazon Bedrock endpoints and quotas
The following are the service endpoints and service quotas for this service. To connect programmatically to an AWS service, you use an endpoint. In addition to the standard AWS endpoints, some AWS services offer FIPS endpoints in selected Regions. For more information, see AWS service endpoints. Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account. For more information, see AWS service quotas.
Service endpoints
Amazon Bedrock control plane APIs
The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for managing, training, and deploying models. Use these endpoints for Amazon Bedrock API operations.
Region Name | Region | Endpoint | Protocol |
---|---|---|---|
US East (Ohio) | us-east-2 |
bedrock.us-east-2.amazonaws.com bedrock-fips.us-east-2.amazonaws.com |
HTTPS HTTPS |
US East (N. Virginia) | us-east-1 |
bedrock.us-east-1.amazonaws.com bedrock-fips.us-east-1.amazonaws.com |
HTTPS HTTPS |
US West (Oregon) | us-west-2 |
bedrock.us-west-2.amazonaws.com bedrock-fips.us-west-2.amazonaws.com |
HTTPS HTTPS |
Asia Pacific (Mumbai) | ap-south-1 | bedrock.ap-south-1.amazonaws.com | HTTPS |
Asia Pacific (Seoul) | ap-northeast-2 | bedrock.ap-northeast-2.amazonaws.com | HTTPS |
Asia Pacific (Singapore) | ap-southeast-1 | bedrock.ap-southeast-1.amazonaws.com | HTTPS |
Asia Pacific (Sydney) | ap-southeast-2 | bedrock.ap-southeast-2.amazonaws.com | HTTPS |
Asia Pacific (Tokyo) | ap-northeast-1 | bedrock.ap-northeast-1.amazonaws.com | HTTPS |
Canada (Central) | ca-central-1 |
bedrock.ca-central-1.amazonaws.com bedrock-fips.ca-central-1.amazonaws.com |
HTTPS HTTPS |
Europe (Frankfurt) | eu-central-1 | bedrock.eu-central-1.amazonaws.com | HTTPS |
Europe (Ireland) | eu-west-1 | bedrock.eu-west-1.amazonaws.com | HTTPS |
Europe (London) | eu-west-2 | bedrock.eu-west-2.amazonaws.com | HTTPS |
Europe (Paris) | eu-west-3 | bedrock.eu-west-3.amazonaws.com | HTTPS |
Europe (Zurich) | eu-central-2 | bedrock.eu-central-2.amazonaws.com | HTTPS |
South America (São Paulo) | sa-east-1 | bedrock.sa-east-1.amazonaws.com | HTTPS |
AWS GovCloud (US-East) | us-gov-east-1 |
bedrock.us-gov-east-1.amazonaws.com bedrock-fips.us-gov-east-1.amazonaws.com |
HTTPS HTTPS |
AWS GovCloud (US-West) | us-gov-west-1 |
bedrock.us-gov-west-1.amazonaws.com bedrock-fips.us-gov-west-1.amazonaws.com |
HTTPS HTTPS |
Amazon Bedrock runtime APIs
The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for making inference requests for models hosted in Amazon Bedrock. Use these endpoints for Amazon Bedrock Runtime API operations.
Region Name | Region | Endpoint | Protocol |
---|---|---|---|
US East (Ohio) | us-east-2 |
bedrock-runtime.us-east-2.amazonaws.com bedrock-runtime-fips.us-east-2.amazonaws.com |
HTTPS HTTPS |
US East (N. Virginia) | us-east-1 |
bedrock-runtime.us-east-1.amazonaws.com bedrock-runtime-fips.us-east-1.amazonaws.com |
HTTPS HTTPS |
US West (Oregon) | us-west-2 |
bedrock-runtime.us-west-2.amazonaws.com bedrock-runtime-fips.us-west-2.amazonaws.com |
HTTPS HTTPS |
Asia Pacific (Mumbai) | ap-south-1 | bedrock-runtime.ap-south-1.amazonaws.com | HTTPS |
Asia Pacific (Seoul) | ap-northeast-2 | bedrock-runtime.ap-northeast-2.amazonaws.com | HTTPS |
Asia Pacific (Singapore) | ap-southeast-1 | bedrock-runtime.ap-southeast-1.amazonaws.com | HTTPS |
Asia Pacific (Sydney) | ap-southeast-2 | bedrock-runtime.ap-southeast-2.amazonaws.com | HTTPS |
Asia Pacific (Tokyo) | ap-northeast-1 | bedrock-runtime.ap-northeast-1.amazonaws.com | HTTPS |
Canada (Central) | ca-central-1 |
bedrock-runtime.ca-central-1.amazonaws.com bedrock-runtime-fips.ca-central-1.amazonaws.com |
HTTPS HTTPS |
Europe (Frankfurt) | eu-central-1 | bedrock-runtime.eu-central-1.amazonaws.com | HTTPS |
Europe (Ireland) | eu-west-1 | bedrock-runtime.eu-west-1.amazonaws.com | HTTPS |
Europe (London) | eu-west-2 | bedrock-runtime.eu-west-2.amazonaws.com | HTTPS |
Europe (Paris) | eu-west-3 | bedrock-runtime.eu-west-3.amazonaws.com | HTTPS |
Europe (Zurich) | eu-central-2 | bedrock-runtime.eu-central-2.amazonaws.com | HTTPS |
South America (São Paulo) | sa-east-1 | bedrock-runtime.sa-east-1.amazonaws.com | HTTPS |
AWS GovCloud (US-East) | us-gov-east-1 |
bedrock-runtime.us-gov-east-1.amazonaws.com bedrock-runtime-fips.us-gov-east-1.amazonaws.com |
HTTPS HTTPS |
AWS GovCloud (US-West) | us-gov-west-1 |
bedrock-runtime.us-gov-west-1.amazonaws.com bedrock-runtime-fips.us-gov-west-1.amazonaws.com |
HTTPS HTTPS |
Agents for Amazon Bedrock build-time APIs
The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for creating and managing agents and knowledge bases. Use these endpoints for Agents for Amazon Bedrock API operations.
Region Name | Region | Endpoint | Protocol |
---|---|---|---|
US East (N. Virginia) | us-east-1 | bedrock-agent.us-east-1.amazonaws.com | HTTPS |
bedrock-agent-fips.us-east-1.amazonaws.com | HTTPS | ||
US West (Oregon) | us-west-2 | bedrock-agent.us-west-2.amazonaws.com | HTTPS |
bedrock-agent-fips.us-west-2.amazonaws.com | HTTPS | ||
Asia Pacific (Singapore) | ap-southeast-1 | bedrock-agent.ap-southeast-1.amazonaws.com | HTTPS |
Asia Pacific (Sydney) | ap-southeast-2 | bedrock-agent.ap-southeast-2.amazonaws.com | HTTPS |
Asia Pacific (Tokyo) | ap-northeast-1 | bedrock-agent.ap-northeast-1.amazonaws.com | HTTPS |
Canada (Central) | ca-central-1 | bedrock-agent.ca-central-1.amazonaws.com | HTTPS |
Europe (Frankfurt) | eu-central-1 | bedrock-agent.eu-central-1.amazonaws.com | HTTPS |
Europe (Ireland) | eu-west-1 | bedrock-agent.eu-west-1.amazonaws.com | HTTPS |
Europe (London) | eu-west-2 | bedrock-agent.eu-west-2.amazonaws.com | HTTPS |
Europe (Paris) | eu-west-3 | bedrock-agent.eu-west-3.amazonaws.com | HTTPS |
Asia Pacific (Mumbai) | ap-south-1 | bedrock-agent.ap-south-1.amazonaws.com | HTTPS |
South America (São Paulo) | sa-east-1 | bedrock-agent.sa-east-1.amazonaws.com | HTTPS |
Agents for Amazon Bedrock runtime APIs
The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for invoking agents and querying knowledge bases. Use these endpoints for Agents for Amazon Bedrock Runtime API operations.
Region Name | Region | Endpoint | Protocol |
---|---|---|---|
US East (N. Virginia) | us-east-1 | bedrock-agent-runtime.us-east-1.amazonaws.com | HTTPS |
bedrock-agent-runtime-fips.us-east-1.amazonaws.com | HTTPS | ||
US West (Oregon) | us-west-2 | bedrock-agent-runtime.us-west-2.amazonaws.com | HTTPS |
bedrock-agent-runtime-fips.us-west-2.amazonaws.com | HTTPS | ||
Asia Pacific (Singapore) | ap-southeast-1 | bedrock-agent-runtime.ap-southeast-1.amazonaws.com | HTTPS |
Asia Pacific (Sydney) | ap-southeast-2 | bedrock-agent-runtime.ap-southeast-2.amazonaws.com | HTTPS |
Asia Pacific (Tokyo) | ap-northeast-1 | bedrock-agent-runtime.ap-northeast-1.amazonaws.com | HTTPS |
Canada (Central) | ca-central-1 | bedrock-agent-runtime.ca-central-1.amazonaws.com | HTTPS |
Europe (Frankfurt) | eu-central-1 | bedrock-agent-runtime.eu-central-1.amazonaws.com | HTTPS |
Europe (Paris) | eu-west-3 | bedrock-agent-runtime.eu-west-3.amazonaws.com | HTTPS |
Europe (Ireland) | eu-west-1 | bedrock-agent-runtime.eu-west-1.amazonaws.com | HTTPS |
Europe (London) | eu-west-2 | bedrock-agent-runtime.eu-west-2.amazonaws.com | HTTPS |
Asia Pacific (Mumbai) | ap-south-1 | bedrock-agent-runtime.ap-south-1.amazonaws.com | HTTPS |
South America (São Paulo) | sa-east-1 | bedrock-agent-runtime.sa-east-1.amazonaws.com | HTTPS |
Service quotas
For instructions on how to request a quota increase, both for quotas whose Adjustable value is marked as Yes and those marked as No, see Request an increase for Amazon Bedrock quotas. The following table shows a list of quotas for Amazon Bedrock:
Name | Default | Adjustable | Description |
---|---|---|---|
APIs per Agent | Each supported Region: 11 |
Yes |
The maximum number of APIs that you can add to an Agent. |
Action groups per Agent | Each supported Region: 20 |
Yes |
The maximum number of action groups that you can add to an Agent. |
Agent nodes per flow | Each supported Region: 10 | No | The maximum number of agent nodes. |
Agents per account | Each supported Region: 50 |
Yes |
The maximum number of Agents in one account. |
AssociateAgentKnowledgeBase requests per second | Each supported Region: 6 | No | The maximum number of AssociateAgentKnowledgeBase API requests per second. |
Associated aliases per Agent | Each supported Region: 10 | No | The maximum number of aliases that you can associate with an Agent. |
Associated knowledge bases per Agent | Each supported Region: 2 |
Yes |
The maximum number of knowledge bases that you can associate with an Agent. |
Batch inference input file size | Each supported Region: 1,073,741,824 |
Yes |
The maximum size of a single file (in bytes) submitted for batch inference. |
Batch inference job size | Each supported Region: 5,368,709,120 |
Yes |
The maximum cumulative size of all input files (in bytes) included in the batch inference job. |
Characters in Agent instructions |
us-west-2: 4,000 ap-northeast-1: 4,000 ap-southeast-1: 4,000 ap-southeast-2: 4,000 eu-west-2: 4,000 Each of the other supported Regions: 8,000 |
Yes |
The maximum number of characters in the instructions for an Agent. |
Collector nodes per flow | Each supported Region: 1 | No | The maximum number of collector nodes. |
Concurrent ingestion jobs per account | Each supported Region: 5 | No | The maximum number of ingestion jobs that can be running at the same time in an account. |
Concurrent ingestion jobs per data source | Each supported Region: 1 | No | The maximum number of ingestion jobs that can be running at the same time for a data source. |
Concurrent ingestion jobs per knowledge base | Each supported Region: 1 | No | The maximum number of ingestion jobs that can be running at the same time for a knowledge base. |
Concurrent model import jobs | Each supported Region: 1 | No | The maximum number of model import jobs that are concurrently in progress. |
Condition nodes per flow | Each supported Region: 5 | No | The maximum number of condition nodes. |
Conditions per condition node | Each supported Region: 5 | No | The maximum number of conditions per condition node. |
Contextual grounding query length in text units | Each supported Region: 1 | No | The maximum length, in text units, of the query for contextual grounding |
Contextual grounding response length in text units | Each supported Region: 5 | No | The maximum length, in text units, of the response for contextual grounding |
Contextual grounding source length in text units |
us-east-1: 100 us-west-2: 100 Each of the other supported Regions: 50 |
No | The maximum length, in text units, of the grounding source for contextual grounding |
CreateAgent requests per second | Each supported Region: 6 | No | The maximum number of CreateAgent API requests per second. |
CreateAgentActionGroup requests per second | Each supported Region: 12 | No | The maximum number of CreateAgentActionGroup API requests per second. |
CreateAgentAlias requests per second | Each supported Region: 2 | No | The maximum number of CreateAgentAlias API requests per second. |
CreateDataSource requests per second | Each supported Region: 2 | No | The maximum number of CreateDataSource API requests per second. |
CreateFlow requests per second | Each supported Region: 2 | No | The maximum number of CreateFlow requests per second. |
CreateFlowAlias requests per second | Each supported Region: 2 | No | The maximum number of CreateFlowAlias requests per second. |
CreateFlowVersion requests per second | Each supported Region: 2 | No | The maximum number of CreateFlowVersion requests per second. |
CreateKnowledgeBase requests per second | Each supported Region: 2 | No | The maximum number of CreateKnowledgeBase API requests per second. |
CreatePrompt requests per second | Each supported Region: 2 | No | The maximum number of CreatePrompt requests per second. |
CreatePromptVersion requests per second | Each supported Region: 2 | No | The maximum number of CreatePromptVersion requests per second. |
Cross-Region InvokeModel requests per minute for Anthropic Claude 3.5 Haiku | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku. |
Cross-Region InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku | Each supported Region: 4,000,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku. |
Custom models per account | Each supported Region: 100 |
Yes |
The maximum number of custom models in an account. |
Data sources per knowledge base | Each supported Region: 5 | No | The maximum number of data sources per knowledge base. |
DeleteAgent requests per second | Each supported Region: 2 | No | The maximum number of DeleteAgent API requests per second. |
DeleteAgentActionGroup requests per second | Each supported Region: 2 | No | The maximum number of DeleteAgentActionGroup API requests per second. |
DeleteAgentAlias requests per second | Each supported Region: 2 | No | The maximum number of DeleteAgentAlias API requests per second. |
DeleteAgentVersion requests per second | Each supported Region: 2 | No | The maximum number of DeleteAgentVersion API requests per second. |
DeleteDataSource requests per second | Each supported Region: 2 | No | The maximum number of DeleteDataSource API requests per second. |
DeleteFlow requests per second | Each supported Region: 2 | No | The maximum number of DeleteFlow requests per second. |
DeleteFlowAlias requests per second | Each supported Region: 2 | No | The maximum number of DeleteFlowAlias requests per second. |
DeleteFlowVersion requests per second | Each supported Region: 2 | No | The maximum number of DeleteFlowVersion requests per second. |
DeleteKnowledgeBase requests per second | Each supported Region: 2 | No | The maximum number of DeleteKnowledgeBase API requests per second. |
DeletePrompt requests per second | Each supported Region: 2 | No | The maximum number of DeletePrompt requests per second. |
DisassociateAgentKnowledgeBase requests per second | Each supported Region: 4 | No | The maximum number of DisassociateAgentKnowledgeBase API requests per second. |
Enabled action groups per agent | Each supported Region: 11 |
Yes |
The maximum number of action groups that you can enable in an Agent. |
Endpoints per inference profile | Each supported Region: 5 | No | The maximum number of endpoints in an inference profile. An endpoint is defined by a model and the region that the invocation requests to the model are sent to. |
Example phrases per Topic | Each supported Region: 5 | No | The maximum number of topic examples that can be included per topic |
Files to add or update per ingestion job | Each supported Region: 5,000,000 | No | The maximum number of new and updated files that can be ingested per ingestion job. |
Files to delete per ingestion job | Each supported Region: 5,000,000 | No | The maximum number of files that can be deleted per ingestion job. |
Flow aliases per flow | Each supported Region: 10 | No | The maximum number of flow aliases. |
Flow versions per flow | Each supported Region: 10 | No | The maximum number of flow versions. |
Flows per account | Each supported Region: 100 |
Yes |
The maximum number of flows per account. |
GetAgent requests per second | Each supported Region: 15 | No | The maximum number of GetAgent API requests per second. |
GetAgentActionGroup requests per second | Each supported Region: 20 | No | The maximum number of GetAgentActionGroup API requests per second. |
GetAgentAlias requests per second | Each supported Region: 10 | No | The maximum number of GetAgentAlias API requests per second. |
GetAgentKnowledgeBase requests per second | Each supported Region: 15 | No | The maximum number of GetAgentKnowledgeBase API requests per second. |
GetAgentVersion requests per second | Each supported Region: 10 | No | The maximum number of GetAgentVersion API requests per second. |
GetDataSource requests per second | Each supported Region: 10 | No | The maximum number of GetDataSource API requests per second. |
GetFlow requests per second | Each supported Region: 10 | No | The maximum number of GetFlow requests per second. |
GetFlowAlias requests per second | Each supported Region: 10 | No | The maximum number of GetFlowAlias requests per second. |
GetFlowVersion requests per second | Each supported Region: 10 | No | The maximum number of GetFlowVersion requests per second. |
GetIngestionJob requests per second | Each supported Region: 10 | No | The maximum number of GetIngestionJob API requests per second. |
GetKnowledgeBase requests per second | Each supported Region: 10 | No | The maximum number of GetKnowledgeBase API requests per second. |
GetPrompt requests per second | Each supported Region: 10 | No | The maximum number of GetPrompt requests per second. |
Guardrails per account | Each supported Region: 100 | No | The maximum number of guardrails in an account |
Imported models per account | Each supported Region: 3 |
Yes |
The maximum number of imported models in an account. |
Inference profiles per account | Each supported Region: 1,000 |
Yes |
The maximum number of inference profiles in an account. |
Ingestion job file size | Each supported Region: 50 | No | The maximum size (in MB) of a file in an ingestion job. |
Ingestion job size | Each supported Region: 100 | No | The maximum size (in GB) of an ingestion job. |
Input nodes per flow | Each supported Region: 1 | No | The maximum number of flow input nodes. |
Iterator nodes per flow | Each supported Region: 1 | No | The maximum number of iterator nodes. |
Knowledge base nodes per flow | Each supported Region: 10 | No | The maximum number of knowledge base nodes. |
Knowledge bases per account | Each supported Region: 100 | No | The maximum number of knowledge bases per account. |
Lambda function nodes per flow | Each supported Region: 10 | No | The maximum number of Lambda function nodes. |
Lex nodes per flow | Each supported Region: 5 | No | The maximum number of Lex nodes. |
ListAgentActionGroups requests per second | Each supported Region: 10 | No | The maximum number of ListAgentActionGroups API requests per second. |
ListAgentAliases requests per second | Each supported Region: 10 | No | The maximum number of ListAgentAliases API requests per second. |
ListAgentKnowledgeBases requests per second | Each supported Region: 10 | No | The maximum number of ListAgentKnowledgeBases API requests per second. |
ListAgentVersions requests per second | Each supported Region: 10 | No | The maximum number of ListAgentVersions API requests per second. |
ListAgents requests per second | Each supported Region: 10 | No | The maximum number of ListAgents API requests per second. |
ListDataSources requests per second | Each supported Region: 10 | No | The maximum number of ListDataSources API requests per second. |
ListFlowAliases requests per second | Each supported Region: 10 | No | The maximum number of ListFlowAliases requests per second. |
ListFlowVersions requests per second | Each supported Region: 10 | No | The maximum number of ListFlowVersions requests per second. |
ListFlows requests per second | Each supported Region: 10 | No | The maximum number of ListFlows requests per second. |
ListIngestionJobs requests per second | Each supported Region: 10 | No | The maximum number of ListIngestionJobs API requests per second. |
ListKnowledgeBases requests per second | Each supported Region: 10 | No | The maximum number of ListKnowledgeBases API requests per second. |
ListPrompts requests per second | Each supported Region: 10 | No | The maximum number of ListPrompts requests per second. |
Model units no-commitment Provisioned Throughputs across base models | Each supported Region: 2 |
Yes |
The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for base models |
Model units no-commitment Provisioned Throughputs across custom models | Each supported Region: 2 |
Yes |
The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for custom models |
Model units per provisioned model for AI21 Labs Jurassic-2 Mid | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Mid. |
Model units per provisioned model for AI21 Labs Jurassic-2 Ultra | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Ultra. |
Model units per provisioned model for Amazon Titan Embeddings G1 - Text | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Embeddings G1 - Text. |
Model units per provisioned model for Amazon Titan Image Generator G1 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G1. |
Model units per provisioned model for Amazon Titan Image Generator G2 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G2. |
Model units per provisioned model for Amazon Titan Lite V1 4K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Lite V1 4K. |
Model units per provisioned model for Amazon Titan Multimodal Embeddings G1 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Multimodal Embeddings G1. |
Model units per provisioned model for Amazon Titan Text Embeddings V2 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Embeddings V2. |
Model units per provisioned model for Amazon Titan Text G1 - Express 8K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text G1 - Express 8K. |
Model units per provisioned model for Amazon Titan Text Premier V1 32K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Premier V1 32K. |
Model units per provisioned model for Anthropic Claude 3 Haiku 200K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 200K. |
Model units per provisioned model for Anthropic Claude 3 Haiku 48K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 48K. |
Model units per provisioned model for Anthropic Claude 3 Sonnet 200K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 200K. |
Model units per provisioned model for Anthropic Claude 3 Sonnet 28K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 28K. |
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 18K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 18K. |
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 200K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 200K. |
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 51K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 51K. |
Model units per provisioned model for Anthropic Claude Instant V1 100K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude Instant V1 100K. |
Model units per provisioned model for Anthropic Claude V2 100K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 100K. |
Model units per provisioned model for Anthropic Claude V2 18K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 18K. |
Model units per provisioned model for Anthropic Claude V2.1 18K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 18K. |
Model units per provisioned model for Anthropic Claude V2.1 200K | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 200k. |
Model units per provisioned model for Cohere Command | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Command. |
Model units per provisioned model for Cohere Command Light | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Command Light. |
Model units per provisioned model for Cohere Command R | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Command R 128k. |
Model units per provisioned model for Cohere Command R Plus | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Command R Plus 128k. |
Model units per provisioned model for Cohere Embed English | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Embed English. |
Model units per provisioned model for Cohere Embed Multilingual | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Cohere Embed Multilingual. |
Model units per provisioned model for Meta Llama 2 13B | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 13B. |
Model units per provisioned model for Meta Llama 2 70B | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 70B. |
Model units per provisioned model for Meta Llama 2 Chat 13B | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 13B. |
Model units per provisioned model for Meta Llama 2 Chat 70B | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 70B. |
Model units per provisioned model for Meta Llama 3 70B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 70B Instruct. |
Model units per provisioned model for Meta Llama 3 8B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 8B Instruct. |
Model units per provisioned model for Meta Llama 3.1 70B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 70B Instruct. |
Model units per provisioned model for Meta Llama 3.1 8B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 8B Instruct. |
Model units per provisioned model for Meta Llama 3.2 1B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 1B Instruct. |
Model units per provisioned model for Meta Llama 3.2 3B Instruct | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 3B Instruct. |
Model units per provisioned model for Mistral Large 2407 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Mistral Large 2407. |
Model units per provisioned model for Mistral Small | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Mistral Small. |
Model units per provisioned model for Stability.ai Stable Diffusion XL 0.8 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 0.8 |
Model units per provisioned model for Stability.ai Stable Diffusion XL 1.0 | Each supported Region: 0 |
Yes |
The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 1.0. |
Number of concurrent automatic model evaluation jobs | Each supported Region: 20 | No | The maximum number of automatic model evaluation jobs that you can specify at one time in this account in the current Region. |
Number of concurrent model evaluation jobs that use human workers | Each supported Region: 10 | No | The maximum number of model evaluation jobs that use human workers you can specify at one time in this account in the current Region. |
Number of custom metrics | Each supported Region: 10 | No | The maximum number of custom metrics that you can specify in a model evaluation job that uses human workers. |
Number of custom prompt datasets in a human-based model evaluation job | Each supported Region: 1 | No | The maximum number of custom prompt datasets that you can specify in a human-based model evaluation job in this account in the current Region. |
Number of datasets per job | Each supported Region: 5 | No | The maximum number of datasets that you can specify in an automated model evaluation job. This includes both custom and built-in prompt datasets. |
Number of evaluation jobs | Each supported Region: 5,000 | No | The maximum number of model evaluation jobs that you can create in this account in the current Region. |
Number of metrics per dataset | Each supported Region: 3 | No | The maximum number of metrics that you can specify per dataset in an automated model evaluation job. This includes both custom and built-in metrics. |
Number of models in a model evaluation job that uses human workers | Each supported Region: 2 | No | The maximum number of models that you can specify in a model evaluation job that uses human workers. |
Number of models in automated model evaluation job | Each supported Region: 1 | No | The maximum number of models that you can specify in an automated model evaluation job. |
Number of prompts in a custom prompt dataset | Each supported Region: 1,000 | No | The maximum number of prompts a custom prompt dataset can contains. |
On-demand ApplyGuardrail Content filter policy text units per second | Each supported Region: 25 | No | The maximum number of text units that can be processed for Content filter policies per second |
On-demand ApplyGuardrail Denied topic policy text units per second | Each supported Region: 25 | No | The maximum number of text units that can be processed for Denied topic policies per second |
On-demand ApplyGuardrail Sensitive information filter policy text units per second | Each supported Region: 25 | No | The maximum number of text units that can be processed for Sensitive information filter policies per second |
On-demand ApplyGuardrail Word filter policy text units per second | Each supported Region: 25 | No | The maximum number of text units that can be processed for Word filter policies per second. |
On-demand ApplyGuardrail contextual grounding policy text units per second |
us-east-1: 106 us-west-2: 106 Each of the other supported Regions: 53 |
No | The maximum number of text units that can be processed for contextual grounding policies per second |
On-demand ApplyGuardrail requests per second | Each supported Region: 25 | No | The maximum number of ApplyGuardrail API calls allowed per second |
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Large | Each supported Region: 100 | No | The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of requests for Converse and InvokeModel |
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Mini | Each supported Region: 100 | No | The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of requests for Converse and InvokeModel |
On-demand InvokeModel requests per minute for AI21 Labs Jamba Instruct | Each supported Region: 100 | No | The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of requests for Converse and InvokeModel |
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Mid | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Mid |
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Ultra | Each supported Region: 100 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Ultra |
On-demand InvokeModel requests per minute for Amazon Titan Image Generator G1 | Each supported Region: 60 | No | The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Image Generator G1. |
On-demand InvokeModel requests per minute for Amazon Titan Image Generator G1 V2 | Each supported Region: 60 | No | The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Image Generator G V2. |
On-demand InvokeModel requests per minute for Amazon Titan Multimodal Embeddings G1 | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1. |
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings |
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings V2 | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings V2 |
On-demand InvokeModel requests per minute for Amazon Titan Text Express | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Express |
On-demand InvokeModel requests per minute for Amazon Titan Text Lite | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Lite |
On-demand InvokeModel requests per minute for Amazon Titan Text Premier | Each supported Region: 100 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Premier |
On-demand InvokeModel requests per minute for Anthropic Claude 3 Haiku |
us-east-1: 1,000 us-west-2: 1,000 ap-northeast-1: 200 ap-southeast-1: 200 Each of the other supported Regions: 400 |
No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku. |
On-demand InvokeModel requests per minute for Anthropic Claude 3 Opus | Each supported Region: 50 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Opus. |
On-demand InvokeModel requests per minute for Anthropic Claude 3 Sonnet |
us-east-1: 500 us-west-2: 500 Each of the other supported Regions: 100 |
No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Sonnet. |
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Haiku | Each supported Region: 1,000 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku. |
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet |
us-east-1: 50 us-east-2: 50 us-west-2: 250 ap-northeast-2: 50 ap-south-1: 50 ap-southeast-2: 50 Each of the other supported Regions: 20 |
No | The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream. |
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet V2 |
us-west-2: 250 Each of the other supported Regions: 50 |
No | The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream. |
On-demand InvokeModel requests per minute for Anthropic Claude Instant |
us-east-1: 1,000 us-west-2: 1,000 Each of the other supported Regions: 400 |
No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude Instant |
On-demand InvokeModel requests per minute for Anthropic Claude V2 |
us-east-1: 500 us-west-2: 500 Each of the other supported Regions: 100 |
No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude V2 |
On-demand InvokeModel requests per minute for Cohere Command | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command. |
On-demand InvokeModel requests per minute for Cohere Command Light | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command Light. |
On-demand InvokeModel requests per minute for Cohere Command R | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R 128k. |
On-demand InvokeModel requests per minute for Cohere Command R Plus | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R Plus 128k. |
On-demand InvokeModel requests per minute for Cohere Embed English | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel in one minute for Cohere Embed English. |
On-demand InvokeModel requests per minute for Cohere Embed Multilingual | Each supported Region: 2,000 | No | The maximum number of times that you can call InvokeModel in one minute for Cohere Embed Multilingual. |
On-demand InvokeModel requests per minute for Meta Llama 2 13B | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 13B. |
On-demand InvokeModel requests per minute for Meta Llama 2 70B | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 70B. |
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 13B | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 13B. |
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 70B | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 70B. |
On-demand InvokeModel requests per minute for Meta Llama 3 70B Instruct | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 3 70B Instruct. |
On-demand InvokeModel requests per minute for Meta Llama 3 8B Instruct | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 3 8B Instruct. |
On-demand InvokeModel requests per minute for Mistral 7B Instruct | Each supported Region: 800 | No | The maximum number of times that you can call InvokeModel in one minute for Mistral mistral-7b-instruct-v0 |
On-demand InvokeModel requests per minute for Mistral AI Mistral Small | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral AI Mistral Small |
On-demand InvokeModel requests per minute for Mistral Large | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral mistral-large-2402-v1 |
On-demand InvokeModel requests per minute for Mistral Mixtral 8x7b Instruct | Each supported Region: 400 | No | The maximum number of times that you can call InvokeModel in one minute for Mistral mixtral-8x7b-v0 |
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Large | Each supported Region: 15 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Large. |
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Medium | Each supported Region: 60 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Medium |
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 0.8 | Each supported Region: 60 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 0.8 |
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 1.0 | Each supported Region: 60 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 1.0 |
On-demand InvokeModel requests per minute for Stability.ai Stable Image Core | Each supported Region: 90 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Core. |
On-demand InvokeModel requests per minute for Stability.ai Stable Image Ultra | Each supported Region: 10 | No | The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Ultra. |
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Large | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of tokens for Converse and InvokeModel. |
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Mini | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of tokens for Converse and InvokeModel. |
On-demand InvokeModel tokens per minute for AI21 Labs Jamba Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of tokens for Converse and InvokeModel |
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Mid | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Mid. |
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Ultra | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Ultra. |
On-demand InvokeModel tokens per minute for Amazon Titan Image Generator G1 | Each supported Region: 2,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Image Generator G1. |
On-demand InvokeModel tokens per minute for Amazon Titan Image Generator G1 V2 | Each supported Region: 2,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Image Generator G1 V2. |
On-demand InvokeModel tokens per minute for Amazon Titan Multimodal Embeddings G1 | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1. |
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings. |
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings V2 | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings V2. |
On-demand InvokeModel tokens per minute for Amazon Titan Text Express | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Express. |
On-demand InvokeModel tokens per minute for Amazon Titan Text Lite | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Lite. |
On-demand InvokeModel tokens per minute for Amazon Titan Text Premier | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Premier. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Haiku |
us-east-1: 2,000,000 us-west-2: 2,000,000 ap-northeast-1: 200,000 ap-southeast-1: 200,000 Each of the other supported Regions: 300,000 |
No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Opus | Each supported Region: 400,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Opus. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Sonnet |
us-east-1: 1,000,000 us-west-2: 1,000,000 Each of the other supported Regions: 200,000 |
No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Sonnet. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku | Each supported Region: 2,000,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet |
us-east-1: 400,000 us-east-2: 400,000 us-west-2: 2,000,000 ap-northeast-2: 400,000 ap-south-1: 400,000 ap-southeast-2: 400,000 Each of the other supported Regions: 200,000 |
No | The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream. |
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet V2 |
us-west-2: 2,000,000 Each of the other supported Regions: 400,000 |
No | The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream. |
On-demand InvokeModel tokens per minute for Anthropic Claude Instant |
us-east-1: 1,000,000 us-west-2: 1,000,000 Each of the other supported Regions: 300,000 |
No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude Instant. |
On-demand InvokeModel tokens per minute for Anthropic Claude V2 |
us-east-1: 500,000 us-west-2: 500,000 Each of the other supported Regions: 200,000 |
No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude V2. |
On-demand InvokeModel tokens per minute for Cohere Command | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command. |
On-demand InvokeModel tokens per minute for Cohere Command Light | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Command Light. |
On-demand InvokeModel tokens per minute for Cohere Command R | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R 128k. |
On-demand InvokeModel tokens per minute for Cohere Command R Plus | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R Plus 128k. |
On-demand InvokeModel tokens per minute for Cohere Embed English | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed English. |
On-demand InvokeModel tokens per minute for Cohere Embed Multilingual | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed Multilingual. |
On-demand InvokeModel tokens per minute for Meta Llama 2 13B | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 13B. |
On-demand InvokeModel tokens per minute for Meta Llama 2 70B | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 70B. |
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 13B | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 13B. |
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 70B | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 70B. |
On-demand InvokeModel tokens per minute for Meta Llama 3 70B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 3 70B Instruct. |
On-demand InvokeModel tokens per minute for Meta Llama 3 8B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 3 8B Instruct. |
On-demand InvokeModel tokens per minute for Mistral AI Mistral 7B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral 7B Instruct. |
On-demand InvokeModel tokens per minute for Mistral AI Mistral Large | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Large. |
On-demand InvokeModel tokens per minute for Mistral AI Mistral Small | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Small. |
On-demand InvokeModel tokens per minute for Mistral AI Mixtral 8X7BB Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral mixtral-8x7b-instruct-v0. |
On-demand model inference requests per minute for Meta Llama 3.1 405B Instruct | Each supported Region: 200 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.1 70B Instruct | Each supported Region: 400 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.1 8B Instruct | Each supported Region: 800 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.2 11B Instruct | Each supported Region: 400 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.2 1B Instruct | Each supported Region: 800 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.2 3B Instruct | Each supported Region: 800 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Meta Llama 3.2 90B Instruct | Each supported Region: 400 | No | The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference requests per minute for Mistral Large 2407 | Each supported Region: 400 | No | The maximum number of times that you can call model inference in one minute for Mistral Large 2407. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream |
On-demand model inference tokens per minute for Meta Llama 3.1 8B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for Meta Llama 3.2 11B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for Meta Llama 3.2 1B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for Meta Llama 3.2 3B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for Meta Llama 3.2 90B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for Mistral Large 2407 | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Mistral Large 2407. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream |
On-demand model inference tokens per minute for for Meta Llama 3.1 405B Instruct | Each supported Region: 400,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
On-demand model inference tokens per minute for for Meta Llama 3.1 70B Instruct | Each supported Region: 300,000 | No | The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. |
Output nodes per flow | Each supported Region: 10 | No | The maximum number of flow output nodes. |
Parameters per function | Each supported Region: 5 |
Yes |
The maximum number of parameters that you can have in an action group function. |
PrepareAgent requests per second | Each supported Region: 2 | No | The maximum number of PrepareAgent API requests per second. |
PrepareFlow requests per second | Each supported Region: 2 | No | The maximum number of PrepareFlow requests per second. |
Prompt nodes per flow | Each supported Region: 10 |
Yes |
The maximum number of prompt nodes. |
Prompts per account | Each supported Region: 50 |
Yes |
The maximum number of prompts. |
Records per batch inference job | Each supported Region: 50,000 |
Yes |
The maximum number of records across all input files in a batch inference job. |
Records per input file per batch inference job | Each supported Region: 50,000 |
Yes |
The maximum number of records in an input file in a batch inference job. |
Regex entities in Sensitive Information Filter | Each supported Region: 10 | No | The maximum number of guardrail filter regexes that can be included in a word policy |
Regex length in characters | Each supported Region: 500 | No | The maximum length, in characters, of a guardrail filter regex |
Retrieve requests per second | Each supported Region: 5 | No | The maximum number of Retrieve API requests per second. |
RetrieveAndGenerate requests per second | Each supported Region: 5 | No | The maximum number of RetrieveAndGenerate API requests per second. |
S3 retrieval nodes per flow | Each supported Region: 10 | No | The maximum number of S3 retrieval nodes. |
S3 storage nodes per flow | Each supported Region: 10 | No | The maximum number of S3 storage nodes. |
Scheduled customization jobs | Each supported Region: 2 | No | The maximum number of scheduled customization jobs. |
Size of prompt | Each supported Region: 4 | No | The maximum size (in KB) of an individual prompt is a custom prompt dataset. |
StartIngestionJob requests per second | Each supported Region: 0.1 | No | The maximum number of StartIngestionJob API requests per second. |
Sum of in-progress and submitted batch inference jobs using a base model |
eu-south-1: 10 Each of the other supported Regions: 20 |
Yes |
The maximum number of in-progress and submitted batch inference jobs using a base model. |
Sum of in-progress and submitted batch inference jobs using a custom model | Each supported Region: 3 |
Yes |
The maximum number of in-progress and submitted batch inference jobs using a custom model |
Sum of training and validation records for a Claude 3 Haiku v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Claude 3 Haiku Fine-tuning job. |
Sum of training and validation records for a Meta Llama 2 13B v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 2 13B Fine-tuning job. |
Sum of training and validation records for a Meta Llama 2 70B v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 2 70B Fine-tuning job. |
Sum of training and validation records for a Meta Llama 3.1 70B Instruct v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 3.1 70B Instruct Fine-tuning job. |
Sum of training and validation records for a Meta Llama 3.1 8B Instruct v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 3.1 8B Instruct Fine-tuning job. |
Sum of training and validation records for a Meta Llama 3.2 1B Instruct v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 3.2 1B Instruct Fine-tuning job. |
Sum of training and validation records for a Meta Llama 3.2 3B Instruct v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Meta Llama 3.2 3B Instruct Fine-tuning job. |
Sum of training and validation records for a Titan Image Generator G1 V1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Image Generator Fine-tuning job. |
Sum of training and validation records for a Titan Image Generator G1 V2 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Image Generator V2 Fine-tuning job. |
Sum of training and validation records for a Titan Multimodal Embeddings G1 v1 Fine-tuning job | Each supported Region: 50,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Multimodal Embeddings Fine-tuning job. |
Sum of training and validation records for a Titan Text G1 - Express v1 Continued Pre-Training job | Each supported Region: 100,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Text Express Continued Pre-Training job. |
Sum of training and validation records for a Titan Text G1 - Express v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Text Express Fine-tuning job. |
Sum of training and validation records for a Titan Text G1 - Lite v1 Continued Pre-Training job | Each supported Region: 100,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Text Lite Continued Pre-Training job. |
Sum of training and validation records for a Titan Text G1 - Lite v1 Fine-tuning job | Each supported Region: 10,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Text Lite Fine-tuning job. |
Sum of training and validation records for a Titan Text G1 - Premier v1 Fine-tuning job | Each supported Region: 20,000 |
Yes |
The maximum combined number of training and validation records allowed for a Titan Text Premier Fine-tuning job. |
Task time for workers | Each supported Region: 30 | No | The maximum length (in days) of time that a worker can have to complete tasks. |
Topics per guardrail | Each supported Region: 30 | No | The maximum number of topics that can be defined across guardrail topic policies |
Total nodes per flow | Each supported Region: 40 | No | The maximum number of nodes in a flow. |
UpdateAgent requests per second | Each supported Region: 4 | No | The maximum number of UpdateAgent API requests per second. |
UpdateAgentActionGroup requests per second | Each supported Region: 6 | No | The maximum number of UpdateAgentActionGroup API requests per second |
UpdateAgentAlias requests per second | Each supported Region: 2 | No | The maximum number of UpdateAgentAlias API requests per second. |
UpdateAgentKnowledgeBase requests per second | Each supported Region: 4 | No | The maximum number of UpdateAgentKnowledgeBase API requests per second. |
UpdateDataSource requests per second | Each supported Region: 2 | No | The maximum number of UpdateDataSource API requests per second. |
UpdateFlow requests per second | Each supported Region: 2 | No | The maximum number of UpdateFlow requests per second. |
UpdateFlowAlias requests per second | Each supported Region: 2 | No | The maximum number of UpdateFlowAlias requests per second. |
UpdateKnowledgeBase requests per second | Each supported Region: 2 | No | The maximum number of UpdateKnowledgeBase API requests per second. |
UpdatePrompt requests per second | Each supported Region: 2 | No | The maximum number of UpdatePrompt requests per second. |
User query size | Each supported Region: 1,000 | No | The maximum size (in characters) of a user query. |
ValidateFlowDefinition requests per second | Each supported Region: 2 | No | The maximum number of ValidateFlowDefinition requests per second. |
Versions per guardrail | Each supported Region: 20 | No | The maximum number of versions that a guardrail can have |
Versions per prompt | Each supported Region: 10 | No | The maximum number of versions per prompt. |
Word length in characters | Each supported Region: 100 | No | The maximum length of a word, in characters, in a blocked word list |
Words per word policy | Each supported Region: 10,000 | No | The maximum number of words that can be included in a blocked word list |