Amazon Bedrock endpoints and quotas

Focus mode

Amazon Bedrock endpoints and quotas - AWS General Reference

Amazon Bedrock service endpoints Amazon Bedrock service quotas

To connect programmatically to an AWS service, you use an endpoint. AWS services offer the following endpoint types in some or all of the AWS Regions that the service supports: IPv4 endpoints, dual-stack endpoints, and FIPS endpoints. Some services provide global endpoints. For more information, see AWS service endpoints.

Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account. For more information, see AWS service quotas.

The following are the service endpoints and service quotas for this service.

Amazon Bedrock service endpoints

Amazon Bedrock control plane APIs

The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for managing, training, and deploying models. Use these endpoints for Amazon Bedrock API operations.

Region Name	Region	Endpoint	Protocol
US East (Ohio)	us-east-2	bedrock.us-east-2.amazonaws.com bedrock-fips.us-east-2.amazonaws.com bedrock-runtime.us-east-2.amazonaws.com	HTTPS HTTPS HTTPS
US East (N. Virginia)	us-east-1	bedrock.us-east-1.amazonaws.com bedrock-fips.us-east-1.amazonaws.com bedrock-runtime.us-east-1.amazonaws.com	HTTPS HTTPS HTTPS
US West (Oregon)	us-west-2	bedrock.us-west-2.amazonaws.com bedrock-fips.us-west-2.amazonaws.com bedrock-runtime.us-west-2.amazonaws.com	HTTPS HTTPS HTTPS
Asia Pacific (Hyderabad)	ap-south-2	bedrock.ap-south-2.amazonaws.com	HTTPS
Asia Pacific (Mumbai)	ap-south-1	bedrock.ap-south-1.amazonaws.com	HTTPS
Asia Pacific (Osaka)	ap-northeast-3	bedrock.ap-northeast-3.amazonaws.com	HTTPS
Asia Pacific (Seoul)	ap-northeast-2	bedrock.ap-northeast-2.amazonaws.com	HTTPS
Asia Pacific (Singapore)	ap-southeast-1	bedrock.ap-southeast-1.amazonaws.com	HTTPS
Asia Pacific (Sydney)	ap-southeast-2	bedrock.ap-southeast-2.amazonaws.com	HTTPS
Asia Pacific (Tokyo)	ap-northeast-1	bedrock.ap-northeast-1.amazonaws.com	HTTPS
Canada (Central)	ca-central-1	bedrock.ca-central-1.amazonaws.com bedrock-fips.ca-central-1.amazonaws.com	HTTPS HTTPS
Europe (Frankfurt)	eu-central-1	bedrock.eu-central-1.amazonaws.com	HTTPS
Europe (Ireland)	eu-west-1	bedrock.eu-west-1.amazonaws.com	HTTPS
Europe (London)	eu-west-2	bedrock.eu-west-2.amazonaws.com	HTTPS
Europe (Milan)	eu-south-1	bedrock.eu-south-1.amazonaws.com	HTTPS
Europe (Paris)	eu-west-3	bedrock.eu-west-3.amazonaws.com	HTTPS
Europe (Spain)	eu-south-2	bedrock.eu-south-2.amazonaws.com	HTTPS
Europe (Stockholm)	eu-north-1	bedrock.eu-north-1.amazonaws.com	HTTPS
Europe (Zurich)	eu-central-2	bedrock.eu-central-2.amazonaws.com	HTTPS
South America (São Paulo)	sa-east-1	bedrock.sa-east-1.amazonaws.com	HTTPS
AWS GovCloud (US-East)	us-gov-east-1	bedrock.us-gov-east-1.amazonaws.com bedrock-fips.us-gov-east-1.amazonaws.com	HTTPS HTTPS
AWS GovCloud (US-West)	us-gov-west-1	bedrock.us-gov-west-1.amazonaws.com bedrock-fips.us-gov-west-1.amazonaws.com	HTTPS HTTPS

Amazon Bedrock runtime APIs

The following table provides a list of Region-specific endpoints that Amazon Bedrock supports for making inference requests for models hosted in Amazon Bedrock. Use these endpoints for Amazon Bedrock Runtime API operations.

Region Name	Region	Endpoint	Protocol
US East (Ohio)	us-east-2	bedrock-runtime.us-east-2.amazonaws.com bedrock-runtime-fips.us-east-2.amazonaws.com	HTTPS HTTPS
US East (N. Virginia)	us-east-1	bedrock-runtime.us-east-1.amazonaws.com bedrock-runtime-fips.us-east-1.amazonaws.com	HTTPS HTTPS
US West (Oregon)	us-west-2	bedrock-runtime.us-west-2.amazonaws.com bedrock-runtime-fips.us-west-2.amazonaws.com	HTTPS HTTPS
Asia Pacific (Hyderabad)	ap-south-2	bedrock-runtime.ap-south-2.amazonaws.com	HTTPS
Asia Pacific (Mumbai)	ap-south-1	bedrock-runtime.ap-south-1.amazonaws.com	HTTPS
Asia Pacific (Osaka)	ap-northeast-3	bedrock-runtime.ap-northeast-3.amazonaws.com	HTTPS
Asia Pacific (Seoul)	ap-northeast-2	bedrock-runtime.ap-northeast-2.amazonaws.com	HTTPS
Asia Pacific (Singapore)	ap-southeast-1	bedrock-runtime.ap-southeast-1.amazonaws.com	HTTPS
Asia Pacific (Sydney)	ap-southeast-2	bedrock-runtime.ap-southeast-2.amazonaws.com	HTTPS
Asia Pacific (Tokyo)	ap-northeast-1	bedrock-runtime.ap-northeast-1.amazonaws.com	HTTPS
Canada (Central)	ca-central-1	bedrock-runtime.ca-central-1.amazonaws.com bedrock-runtime-fips.ca-central-1.amazonaws.com	HTTPS HTTPS
Europe (Frankfurt)	eu-central-1	bedrock-runtime.eu-central-1.amazonaws.com	HTTPS
Europe (Ireland)	eu-west-1	bedrock-runtime.eu-west-1.amazonaws.com	HTTPS
Europe (London)	eu-west-2	bedrock-runtime.eu-west-2.amazonaws.com	HTTPS
Europe (Milan)	eu-south-1	bedrock-runtime.eu-south-1.amazonaws.com	HTTPS
Europe (Paris)	eu-west-3	bedrock-runtime.eu-west-3.amazonaws.com	HTTPS
Europe (Spain)	eu-south-2	bedrock-runtime.eu-south-2.amazonaws.com	HTTPS
Europe (Stockholm)	eu-north-1	bedrock-runtime.eu-north-1.amazonaws.com	HTTPS
Europe (Zurich)	eu-central-2	bedrock-runtime.eu-central-2.amazonaws.com	HTTPS
South America (São Paulo)	sa-east-1	bedrock-runtime.sa-east-1.amazonaws.com	HTTPS
AWS GovCloud (US-East)	us-gov-east-1	bedrock-runtime.us-gov-east-1.amazonaws.com bedrock-runtime-fips.us-gov-east-1.amazonaws.com	HTTPS HTTPS
AWS GovCloud (US-West)	us-gov-west-1	bedrock-runtime.us-gov-west-1.amazonaws.com bedrock-runtime-fips.us-gov-west-1.amazonaws.com	HTTPS HTTPS

Agents for Amazon Bedrock build-time APIs

The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for creating and managing agents and knowledge bases. Use these endpoints for Agents for Amazon Bedrock API operations.

Region Name	Region	Endpoint	Protocol
US East (N. Virginia)	us-east-1	bedrock-agent.us-east-1.amazonaws.com	HTTPS
US East (N. Virginia)	us-east-1	bedrock-agent-fips.us-east-1.amazonaws.com	HTTPS
US West (Oregon)	us-west-2	bedrock-agent.us-west-2.amazonaws.com	HTTPS
US West (Oregon)	us-west-2	bedrock-agent-fips.us-west-2.amazonaws.com	HTTPS
Asia Pacific (Singapore)	ap-southeast-1	bedrock-agent.ap-southeast-1.amazonaws.com	HTTPS
Asia Pacific (Sydney)	ap-southeast-2	bedrock-agent.ap-southeast-2.amazonaws.com	HTTPS
Asia Pacific (Tokyo)	ap-northeast-1	bedrock-agent.ap-northeast-1.amazonaws.com	HTTPS
Canada (Central)	ca-central-1	bedrock-agent.ca-central-1.amazonaws.com	HTTPS
Europe (Frankfurt)	eu-central-1	bedrock-agent.eu-central-1.amazonaws.com	HTTPS
Europe (Ireland)	eu-west-1	bedrock-agent.eu-west-1.amazonaws.com	HTTPS
Europe (London)	eu-west-2	bedrock-agent.eu-west-2.amazonaws.com	HTTPS
Europe (Paris)	eu-west-3	bedrock-agent.eu-west-3.amazonaws.com	HTTPS
Asia Pacific (Mumbai)	ap-south-1	bedrock-agent.ap-south-1.amazonaws.com	HTTPS
South America (São Paulo)	sa-east-1	bedrock-agent.sa-east-1.amazonaws.com	HTTPS

Agents for Amazon Bedrock runtime APIs

The following table provides a list of Region-specific endpoints that Agents for Amazon Bedrock supports for invoking agents and querying knowledge bases. Use these endpoints for Agents for Amazon Bedrock Runtime API operations.

Region Name	Region	Endpoint	Protocol
US East (N. Virginia)	us-east-1	bedrock-agent-runtime.us-east-1.amazonaws.com	HTTPS
US East (N. Virginia)	us-east-1	bedrock-agent-runtime-fips.us-east-1.amazonaws.com	HTTPS
US West (Oregon)	us-west-2	bedrock-agent-runtime.us-west-2.amazonaws.com	HTTPS
US West (Oregon)	us-west-2	bedrock-agent-runtime-fips.us-west-2.amazonaws.com	HTTPS
Asia Pacific (Singapore)	ap-southeast-1	bedrock-agent-runtime.ap-southeast-1.amazonaws.com	HTTPS
Asia Pacific (Sydney)	ap-southeast-2	bedrock-agent-runtime.ap-southeast-2.amazonaws.com	HTTPS
Asia Pacific (Tokyo)	ap-northeast-1	bedrock-agent-runtime.ap-northeast-1.amazonaws.com	HTTPS
Canada (Central)	ca-central-1	bedrock-agent-runtime.ca-central-1.amazonaws.com	HTTPS
Europe (Frankfurt)	eu-central-1	bedrock-agent-runtime.eu-central-1.amazonaws.com	HTTPS
Europe (Paris)	eu-west-3	bedrock-agent-runtime.eu-west-3.amazonaws.com	HTTPS
Europe (Ireland)	eu-west-1	bedrock-agent-runtime.eu-west-1.amazonaws.com	HTTPS
Europe (London)	eu-west-2	bedrock-agent-runtime.eu-west-2.amazonaws.com	HTTPS
Asia Pacific (Mumbai)	ap-south-1	bedrock-agent-runtime.ap-south-1.amazonaws.com	HTTPS
South America (São Paulo)	sa-east-1	bedrock-agent-runtime.sa-east-1.amazonaws.com	HTTPS

Amazon Bedrock service quotas

This section describes the Amazon Bedrock service-level quotas.

Note

You can request a quota increase for your account by following the steps below:

If a quota is marked as Yes, you can adjust it by following the steps at Requesting a Quota Increase in the Service Quotas User Guide.
If a quota is marked as No, you can submit a request through the limit increase form to be considered for an increase.
For any model, you can request an increase for the following quotas together:
- Cross-Region InvokeModel tokens per minute for ${model}
- Cross-Region InvokeModel requests per minute for ${model}
- On-demand InvokeModel tokens per minute for ${model}
- On-demand InvokeModel requests per minute for ${model}
To request an increase for any combination of these quotas, request an increase for the Cross-Region InvokeModel tokens per minute for ${model} quota by following the steps at Requesting a Quota Increase in the Service Quotas User Guide. After you do so, the support team will reach out and offer you the option of also increasing the other three quotas. Due to overwhelming demand, priority will be given to customers who generate traffic that consumes their existing quota allocation. Your request might be denied if you don't meet this condition.

To view service quotas for Amazon Bedrock in a console interface, follow the steps at Viewing service quotas and select Amazon Bedrock as the service. You can also refer to the following table:

Name	Default	Adjustable	Description
(Console) Maximum document file size (MB)	Each supported Region: 200	No	No Description Available
(Console) Maximum number of pages per document file	Each supported Region: 20	No	No Description Available
APIs per Agent	Each supported Region: 11	Yes	The maximum number of APIs that you can add to an Agent.
Action groups per Agent	Each supported Region: 20	Yes	The maximum number of action groups that you can add to an Agent.
Agent Collaborators per Agent	Each supported Region: 10	Yes	The maximum number of collaborator agents that you can add to an Agent.
Agent nodes per flow	Each supported Region: 20	No	The maximum number of agent nodes.
Agents per account	Each supported Region: 200	Yes	The maximum number of Agents in one account.
AssociateAgentKnowledgeBase requests per second	Each supported Region: 6	No	The maximum number of AssociateAgentKnowledgeBase API requests per second.
Associated aliases per Agent	Each supported Region: 10	No	The maximum number of aliases that you can associate with an Agent.
Associated knowledge bases per Agent	Each supported Region: 2	Yes	The maximum number of knowledge bases that you can associate with an Agent.
Batch inference input file size (in GB) for Claude 3 Haiku	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3 Haiku.
Batch inference input file size (in GB) for Claude 3 Opus	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3 Opus.
Batch inference input file size (in GB) for Claude 3 Sonnet	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3 Sonnet.
Batch inference input file size (in GB) for Claude 3.5 Haiku	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3.5 Haiku.
Batch inference input file size (in GB) for Claude 3.5 Sonnet	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3.5 Sonnet.
Batch inference input file size (in GB) for Claude 3.5 Sonnet v2	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Claude 3.5 Sonnet v2.
Batch inference input file size (in GB) for Llama 3.1 405B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.1 405B Instruct.
Batch inference input file size (in GB) for Llama 3.1 70B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.1 70B Instruct.
Batch inference input file size (in GB) for Llama 3.1 8B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.1 8B Instruct.
Batch inference input file size (in GB) for Llama 3.2 11B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.2 11B Instruct.
Batch inference input file size (in GB) for Llama 3.2 1B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference Llama 3.2 1B Instruct.
Batch inference input file size (in GB) for Llama 3.2 3B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.2 3B Instruct.
Batch inference input file size (in GB) for Llama 3.2 90B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.2 90B Instruct.
Batch inference input file size (in GB) for Llama 3.3 70B Instruct	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Llama 3.3 70B Instruct.
Batch inference input file size (in GB) for Mistral Large 2 (24.07)	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Mistral Large 2 (24.07).
Batch inference input file size (in GB) for Mistral Small	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Mistral Small.
Batch inference input file size (in GB) for Nova Lite V1	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Nova Lite V1.
Batch inference input file size (in GB) for Nova Micro V1	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Nova Micro V1.
Batch inference input file size (in GB) for Nova Pro V1	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Nova Pro V1.
Batch inference input file size (in GB) for Titan Multimodal Embeddings G1	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Titan Multimodal Embeddings G1.
Batch inference input file size for Titan Text Embeddings V2 (in GB)	Each supported Region: 1	No	The maximum size of a single file (in GB) submitted for batch inference for Titan Text Embeddings V2.
Batch inference job size (in GB) for Claude 3 Haiku	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3 Haiku.
Batch inference job size (in GB) for Claude 3 Opus	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3 Opus.
Batch inference job size (in GB) for Claude 3 Sonnet	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3 Sonnet.
Batch inference job size (in GB) for Claude 3.5 Haiku	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3.5 Haiku.
Batch inference job size (in GB) for Claude 3.5 Sonnet	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3.5 Sonnet.
Batch inference job size (in GB) for Claude 3.5 Sonnet v2	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Claude 3.5 Sonnet v2.
Batch inference job size (in GB) for Llama 3.1 405B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.1 405B Instruct.
Batch inference job size (in GB) for Llama 3.1 70B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.1 70B Instruct.
Batch inference job size (in GB) for Llama 3.1 8B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.1 8B Instruct.
Batch inference job size (in GB) for Llama 3.2 11B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.2 11B Instruct.
Batch inference job size (in GB) for Llama 3.2 1B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.2 1B Instruct.
Batch inference job size (in GB) for Llama 3.2 3B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.2 3B Instruct.
Batch inference job size (in GB) for Llama 3.2 90B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.2 90B Instruct.
Batch inference job size (in GB) for Llama 3.3 70B Instruct	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Llama 3.3 70B Instruct.
Batch inference job size (in GB) for Mistral Large 2 (24.07)	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Mistral Large 2 (24.07).
Batch inference job size (in GB) for Mistral Small	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Mistral Small.
Batch inference job size (in GB) for Nova Lite V1	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Nova Lite V1.
Batch inference job size (in GB) for Nova Micro V1	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Nova Micro V1.
Batch inference job size (in GB) for Nova Pro V1	Each supported Region: 100	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Nova Pro V1.
Batch inference job size (in GB) for Titan Multimodal Embeddings G1	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Titan Multimodal Embeddings G1.
Batch inference job size for Titan Text Embeddings V2 (in GB)	Each supported Region: 5	No	The maximum cumulative size of all input files (in GB) included in the batch inference job for Titan Text Embeddings V2.
Characters in Agent instructions	Each supported Region: 20,000	No	The maximum number of characters in the instructions for an Agent.
Collector nodes per flow	Each supported Region: 1	No	The maximum number of collector nodes.
Concurrent ingestion jobs per account	Each supported Region: 5	No	The maximum number of ingestion jobs that can be running at the same time in an account.
Concurrent ingestion jobs per data source	Each supported Region: 1	No	The maximum number of ingestion jobs that can be running at the same time for a data source.
Concurrent ingestion jobs per knowledge base	Each supported Region: 1	No	The maximum number of ingestion jobs that can be running at the same time for a knowledge base.
Concurrent model import jobs	Each supported Region: 1	No	The maximum number of model import jobs that are concurrently in progress.
Condition nodes per flow	Each supported Region: 5	No	The maximum number of condition nodes.
Conditions per condition node	Each supported Region: 5	No	The maximum number of conditions per condition node.
Contextual grounding query length in text units	Each supported Region: 1	No	The maximum length, in text units, of the query for contextual grounding
Contextual grounding response length in text units	Each supported Region: 5	No	The maximum length, in text units, of the response for contextual grounding
Contextual grounding source length in text units	us-east-1: 100 us-west-2: 100 Each of the other supported Regions: 50	No	The maximum length, in text units, of the grounding source for contextual grounding
CreateAgent requests per second	Each supported Region: 6	No	The maximum number of CreateAgent API requests per second.
CreateAgentActionGroup requests per second	Each supported Region: 12	No	The maximum number of CreateAgentActionGroup API requests per second.
CreateAgentAlias requests per second	Each supported Region: 2	No	The maximum number of CreateAgentAlias API requests per second.
CreateBlueprint - Max number of blueprints per account	Each supported Region: 350	Yes	No Description Available
CreateBlueprintVersion - Max number of Blueprint versions per Blueprint	Each supported Region: 10	Yes	No Description Available
CreateDataSource requests per second	Each supported Region: 2	No	The maximum number of CreateDataSource API requests per second.
CreateFlow requests per second	Each supported Region: 2	No	The maximum number of CreateFlow requests per second.
CreateFlowAlias requests per second	Each supported Region: 2	No	The maximum number of CreateFlowAlias requests per second.
CreateFlowVersion requests per second	Each supported Region: 2	No	The maximum number of CreateFlowVersion requests per second.
CreateKnowledgeBase requests per second	Each supported Region: 2	No	The maximum number of CreateKnowledgeBase API requests per second.
CreatePrompt requests per second	Each supported Region: 2	No	The maximum number of CreatePrompt requests per second.
CreatePromptVersion requests per second	Each supported Region: 2	No	The maximum number of CreatePromptVersion requests per second.
Cross-Region InvokeModel requests per minute for Anthropic Claude 3.5 Haiku	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
Cross-Region InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet V2	us-west-2: 500 Each of the other supported Regions: 100	No	The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-Region InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku	Each supported Region: 4,000,000	Yes	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku.
Cross-Region InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet V2	us-west-2: 4,000,000 Each of the other supported Regions: 800,000	Yes	The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region InvokeModel requests per minute for Anthropic Claude 3 Haiku	us-east-1: 2,000 us-west-2: 2,000 ap-northeast-1: 400 ap-southeast-1: 400 Each of the other supported Regions: 800	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
Cross-region InvokeModel requests per minute for Anthropic Claude 3 Sonnet	us-east-1: 1,000 us-west-2: 1,000 Each of the other supported Regions: 200	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Sonnet.
Cross-region InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet	us-west-2: 500 ap-northeast-1: 40 ap-southeast-1: 40 eu-central-1: 40 eu-west-1: 40 eu-west-3: 40 Each of the other supported Regions: 100	No	The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region InvokeModel tokens per minute for Anthropic Claude 3 Haiku	us-east-1: 4,000,000 us-west-2: 4,000,000 ap-northeast-1: 400,000 ap-southeast-1: 400,000 Each of the other supported Regions: 600,000	Yes	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
Cross-region InvokeModel tokens per minute for Anthropic Claude 3 Sonnet	us-east-1: 2,000,000 us-west-2: 2,000,000 Each of the other supported Regions: 400,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Sonnet.
Cross-region InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet	us-west-2: 4,000,000 ap-northeast-1: 400,000 ap-southeast-1: 400,000 eu-central-1: 400,000 eu-west-1: 400,000 eu-west-3: 400,000 Each of the other supported Regions: 800,000	Yes	The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Amazon Nova Lite	us-east-1: 2,000 us-east-2: 2,000 us-west-2: 2,000 Each of the other supported Regions: 200	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Amazon Nova Lite. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Amazon Nova Micro	us-east-1: 2,000 us-east-2: 2,000 us-west-2: 2,000 Each of the other supported Regions: 200	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Amazon Nova Micro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Amazon Nova Pro	us-east-1: 200 us-east-2: 200 us-west-2: 200 Each of the other supported Regions: 100	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Amazon Nova Pro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Anthropic Claude 3 Opus	Each supported Region: 100	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Anthropic Claude 3 Opus. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Anthropic Claude 3.7 Sonnet V1	us-east-1: 250 us-east-2: 250 us-west-2: 250 Each of the other supported Regions: 100	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Anthropic Claude 3.7 Sonnet V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for DeepSeek R1 V1	Each supported Region: 200	No	The maximum number of cross-region requests that you can submit for model inference in one minute for DeepSeek R1 V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 400	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 800	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Meta Llama 3.1 8B Instruct	Each supported Region: 1,600	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Meta Llama 3.2 1B Instruct	Each supported Region: 1,600	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Meta Llama 3.2 3B Instruct	Each supported Region: 1,600	No	The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
Cross-region model inference requests per minute for Meta Llama 3.3 70B Instruct	Each supported Region: 800	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Meta Llama 3.3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference requests per minute for Mistral Pixtral Large 25.02 V1	Each supported Region: 10	No	The maximum number of cross-region requests that you can submit for model inference in one minute for Mistral Pixtral Large 25.02 V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Amazon Nova Lite	us-east-1: 4,000,000 us-east-2: 4,000,000 us-west-2: 4,000,000 Each of the other supported Regions: 200,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Amazon Nova Lite. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Amazon Nova Micro	us-east-1: 4,000,000 us-east-2: 4,000,000 us-west-2: 4,000,000 Each of the other supported Regions: 200,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Amazon Nova Micro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Amazon Nova Pro	us-east-1: 800,000 us-east-2: 800,000 us-west-2: 800,000 Each of the other supported Regions: 200,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Amazon Nova Pro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Anthropic Claude 3 Opus	Each supported Region: 800,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Anthropic Claude 3 Opus. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Anthropic Claude 3.7 Sonnet V1	us-east-1: 1,000,000 us-east-2: 1,000,000 us-west-2: 1,000,000 Each of the other supported Regions: 100,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Anthropic Claude 3.7 Sonnet V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for DeepSeek R1 V1	Each supported Region: 200,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for DeepSeek R1 V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 800,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 600,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Meta Llama 3.1 8B Instruct	Each supported Region: 600,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Meta Llama 3.2 1B Instruct	Each supported Region: 600,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Meta Llama 3.2 3B Instruct	Each supported Region: 600,000	Yes	The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
Cross-region model inference tokens per minute for Meta Llama 3.3 70B Instruct	Each supported Region: 600,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Meta Llama 3.3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Cross-region model inference tokens per minute for Mistral Pixtral Large 25.02 V1	Each supported Region: 80,000	Yes	The maximum number of cross-region tokens that you can submit for model inference in one minute for Mistral Pixtral Large 25.02 V1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
Custom models per account	Each supported Region: 100	Yes	The maximum number of custom models in an account.
Data sources per knowledge base	Each supported Region: 5	No	The maximum number of data sources per knowledge base.
DeleteAgent requests per second	Each supported Region: 2	No	The maximum number of DeleteAgent API requests per second.
DeleteAgentActionGroup requests per second	Each supported Region: 2	No	The maximum number of DeleteAgentActionGroup API requests per second.
DeleteAgentAlias requests per second	Each supported Region: 2	No	The maximum number of DeleteAgentAlias API requests per second.
DeleteAgentVersion requests per second	Each supported Region: 2	No	The maximum number of DeleteAgentVersion API requests per second.
DeleteDataSource requests per second	Each supported Region: 2	No	The maximum number of DeleteDataSource API requests per second.
DeleteFlow requests per second	Each supported Region: 2	No	The maximum number of DeleteFlow requests per second.
DeleteFlowAlias requests per second	Each supported Region: 2	No	The maximum number of DeleteFlowAlias requests per second.
DeleteFlowVersion requests per second	Each supported Region: 2	No	The maximum number of DeleteFlowVersion requests per second.
DeleteKnowledgeBase requests per second	Each supported Region: 2	No	The maximum number of DeleteKnowledgeBase API requests per second.
DeletePrompt requests per second	Each supported Region: 2	No	The maximum number of DeletePrompt requests per second.
Description length for fields (Characters)	Each supported Region: 300	No	No Description Available
DisassociateAgentKnowledgeBase requests per second	Each supported Region: 4	No	The maximum number of DisassociateAgentKnowledgeBase API requests per second.
Enabled action groups per agent	Each supported Region: 11	Yes	The maximum number of action groups that you can enable in an Agent.
Endpoints per inference profile	Each supported Region: 5	No	The maximum number of endpoints in an inference profile. An endpoint is defined by a model and the region that the invocation requests to the model are sent to.
Example phrases per Topic	Each supported Region: 5	No	The maximum number of topic examples that can be included per topic
Files to add or update per ingestion job	Each supported Region: 5,000,000	No	The maximum number of new and updated files that can be ingested per ingestion job.
Files to delete per ingestion job	Each supported Region: 5,000,000	No	The maximum number of files that can be deleted per ingestion job.
Flow aliases per flow	Each supported Region: 10	No	The maximum number of flow aliases.
Flow versions per flow	Each supported Region: 10	No	The maximum number of flow versions.
Flows per account	Each supported Region: 100	Yes	The maximum number of flows per account.
GetAgent requests per second	Each supported Region: 15	No	The maximum number of GetAgent API requests per second.
GetAgentActionGroup requests per second	Each supported Region: 20	No	The maximum number of GetAgentActionGroup API requests per second.
GetAgentAlias requests per second	Each supported Region: 10	No	The maximum number of GetAgentAlias API requests per second.
GetAgentKnowledgeBase requests per second	Each supported Region: 15	No	The maximum number of GetAgentKnowledgeBase API requests per second.
GetAgentVersion requests per second	Each supported Region: 10	No	The maximum number of GetAgentVersion API requests per second.
GetDataSource requests per second	Each supported Region: 10	No	The maximum number of GetDataSource API requests per second.
GetFlow requests per second	Each supported Region: 10	No	The maximum number of GetFlow requests per second.
GetFlowAlias requests per second	Each supported Region: 10	No	The maximum number of GetFlowAlias requests per second.
GetFlowVersion requests per second	Each supported Region: 10	No	The maximum number of GetFlowVersion requests per second.
GetIngestionJob requests per second	Each supported Region: 10	No	The maximum number of GetIngestionJob API requests per second.
GetKnowledgeBase requests per second	Each supported Region: 10	No	The maximum number of GetKnowledgeBase API requests per second.
GetPrompt requests per second	Each supported Region: 10	No	The maximum number of GetPrompt requests per second.
Guardrails per account	Each supported Region: 100	No	The maximum number of guardrails in an account
Imported models per account	Each supported Region: 3	Yes	The maximum number of imported models in an account.
Inference profiles per account	Each supported Region: 1,000	Yes	The maximum number of inference profiles in an account.
Ingestion job file size	Each supported Region: 50	No	The maximum size (in MB) of a file in an ingestion job.
Ingestion job size	Each supported Region: 100	No	The maximum size (in GB) of an ingestion job.
Input nodes per flow	Each supported Region: 1	No	The maximum number of flow input nodes.
InvokeDataAutomationAsync - Audio - Max number of concurrent jobs	Each supported Region: 20	Yes	No Description Available
InvokeDataAutomationAsync - Document - Max number of concurrent jobs	Each supported Region: 25	Yes	No Description Available
InvokeDataAutomationAsync - Image - Max number of concurrent jobs	Each supported Region: 20	Yes	No Description Available
InvokeDataAutomationAsync - Max number of open jobs	Each supported Region: 1,800	No	No Description Available
InvokeDataAutomationAsync - Video - Max number of concurrent jobs	Each supported Region: 20	Yes	No Description Available
Iterator nodes per flow	Each supported Region: 1	No	The maximum number of iterator nodes.
Knowledge base nodes per flow	Each supported Region: 20	No	The maximum number of knowledge base nodes.
Knowledge bases per account	Each supported Region: 100	No	The maximum number of knowledge bases per account.
Lambda function nodes per flow	Each supported Region: 20	No	The maximum number of Lambda function nodes.
Lex nodes per flow	Each supported Region: 5	No	The maximum number of Lex nodes.
ListAgentActionGroups requests per second	Each supported Region: 10	No	The maximum number of ListAgentActionGroups API requests per second.
ListAgentAliases requests per second	Each supported Region: 10	No	The maximum number of ListAgentAliases API requests per second.
ListAgentKnowledgeBases requests per second	Each supported Region: 10	No	The maximum number of ListAgentKnowledgeBases API requests per second.
ListAgentVersions requests per second	Each supported Region: 10	No	The maximum number of ListAgentVersions API requests per second.
ListAgents requests per second	Each supported Region: 10	No	The maximum number of ListAgents API requests per second.
ListDataSources requests per second	Each supported Region: 10	No	The maximum number of ListDataSources API requests per second.
ListFlowAliases requests per second	Each supported Region: 10	No	The maximum number of ListFlowAliases requests per second.
ListFlowVersions requests per second	Each supported Region: 10	No	The maximum number of ListFlowVersions requests per second.
ListFlows requests per second	Each supported Region: 10	No	The maximum number of ListFlows requests per second.
ListIngestionJobs requests per second	Each supported Region: 10	No	The maximum number of ListIngestionJobs API requests per second.
ListKnowledgeBases requests per second	Each supported Region: 10	No	The maximum number of ListKnowledgeBases API requests per second.
ListPrompts requests per second	Each supported Region: 10	No	The maximum number of ListPrompts requests per second.
Maximum Audio Sample Rate (Hz)	Each supported Region: 48,000	No	No Description Available
Maximum Blueprints per Project (Documents)	Each supported Region: 40	No	No Description Available
Maximum Blueprints per Project (Images)	Each supported Region: 1	No	No Description Available
Maximum JSON Blueprint Size (Characters)	Each supported Region: 100,000	No	No Description Available
Maximum Levels of Field Hierarchy	Each supported Region: 1	No	No Description Available
Maximum Number of pages per document	us-east-1: 3,000 Each of the other supported Regions: 1,500	No	No Description Available
Maximum Resolution	Each supported Region: 8,000	No	No Description Available
Maximum audio file size (MB)	Each supported Region: 2,048	No	No Description Available
Maximum audio length (Minutes)	Each supported Region: 240	No	No Description Available
Maximum document file size (MB)	Each supported Region: 500	No	No Description Available
Maximum hourly input token units for model claude-3-5-sonnet-20241022-v2	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined claude-3-5-sonnet-20241022-v2 PTv2 Provisions exceeding monthly committed input token units.
Maximum hourly input token units for model llama3-2-11b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-11b-instruct-v1 PTv2 Provisions exceeding monthly committed input token units.
Maximum hourly input token units for model llama3-2-1b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-1b-instruct-v1 PTv2 Provisions exceeding monthly committed input token units.
Maximum hourly input token units for model llama3-2-3b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-3b-instruct-v1 PTv2 Provisions exceeding monthly committed input token units.
Maximum hourly input token units for model llama3-2-90b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-90b-instruct-v1 PTv2 Provisions exceeding monthly committed input token units.
Maximum hourly output token units for model claude-3-5-sonnet-20241022-v2	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined claude-3-5-sonnet-20241022-v2 PTv2 Provisions exceeding monthly committed output token units.
Maximum hourly output token units for model llama3-2-11b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-11b-instruct-v1 PTv2 Provisions exceeding monthly committed output token units.
Maximum hourly output token units for model llama3-2-1b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-1b-instruct-v1 PTv2 Provisions exceeding monthly committed output token units.
Maximum hourly output token units for model llama3-2-3b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-3b-instruct-v1 PTv2 Provisions exceeding monthly committed output token units.
Maximum hourly output token units for model llama3-2-90b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-90b-instruct-v1 PTv2 Provisions exceeding monthly committed output token units.
Maximum image file size (MB)	Each supported Region: 5	No	No Description Available
Maximum input file size for distillation customization jobs	Each supported Region: 2 Gigabytes	No	The maximum input file size for distillation customization jobs.
Maximum line length for distillation customization jobs	Each supported Region: 16 Kilobytes	No	The maximum line length in input file for distillation customization jobs.
Maximum monthly input token units for model claude-3-5-sonnet-20241022-v2	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined claude-3-5-sonnet-20241022-v2 PTv2 Reservations.
Maximum monthly input token units for model llama3-2-11b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-11b-instruct-v1 PTv2 Reservations.
Maximum monthly input token units for model llama3-2-1b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-1b-instruct-v1 PTv2 Reservations.
Maximum monthly input token units for model llama3-2-3b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-3b-instruct-v1 PTv2 Reservations.
Maximum monthly input token units for model llama3-2-90b-instruct-v1	Each supported Region: 10,000	Yes	Maximum input token units (x tokens-per-minute) for combined llama3-2-90b-instruct-v1 PTv2 Reservations.
Maximum monthly output token units for model claude-3-5-sonnet-20241022-v2	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined claude-3-5-sonnet-20241022-v2 PTv2 Reservations.
Maximum monthly output token units for model llama3-2-11b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-11b-instruct-v1 PTv2 Reservations.
Maximum monthly output token units for model llama3-2-1b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-1b-instruct-v1 PTv2 Reservations.
Maximum monthly output token units for model llama3-2-3b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-3b-instruct-v1 PTv2 Reservations.
Maximum monthly output token units for model llama3-2-90b-instruct-v1	Each supported Region: 10,000	Yes	Maximum output token units (x tokens-per-minute) for combined llama3-2-90b-instruct-v1 PTv2 Reservations.
Maximum number of Blueprints per Start Inference request (Documents)	Each supported Region: 10	No	No Description Available
Maximum number of Blueprints per Start Inference request (Images)	Each supported Region: 1	No	No Description Available
Maximum number of list fields per Blueprint	Each supported Region: 15	No	No Description Available
Maximum number of prompts for distillation customization jobs	Each supported Region: 15,000	No	The maximum number of prompts required for distillation customization jobs.
Maximum student model fine tuning context length for Amazon Nova Micro V1 distillation customization jobs	Each supported Region: 32,000	No	The maximum student model fine tuning context length for Amazon Nova Micro V1 distillation customization jobs.
Maximum student model fine tuning context length for Amazon Nova V1 distillation customization jobs	Each supported Region: 32,000	No	The maximum student model fine tuning context length for Amazon Nova V1 distillation customization jobs.
Maximum student model fine tuning context length for Anthropic Claude 3 haiku 20240307 V1 distillation customization jobs	Each supported Region: 32,000	No	The maximum student model fine tuning context length for Anthropic Claude 3 haiku 20240307 V1 distillation customization jobs.
Maximum student model fine tuning context length for Llama 3.1 70B Instruct V1 distillation customization jobs	Each supported Region: 16,000	No	The maximum student model fine tuning context length for Llama 3.1 70B Instruct V1 distillation customization jobs.
Maximum student model fine tuning context length for Llama 3.1 8B Instruct V1 distillation customization jobs	Each supported Region: 32,000	No	The maximum student model fine tuning context length for Llama 3.1 8B Instruct V1 distillation customization jobs.
Maximum video file size (MB)	Each supported Region: 10,240	No	No Description Available
Maximum video length (Minutes)	Each supported Region: 240	No	No Description Available
Minimum Audio Sample Rate (Hz)	Each supported Region: 8,000	No	No Description Available
Minimum audio length (Miliseconds)	Each supported Region: 500	No	No Description Available
Minimum number of prompts for distillation customization jobs	Each supported Region: 100	No	The minimum number of prompts required for distillation customization jobs.
Minimum number of records per batch inference job for Claude 3 Haiku	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3 Haiku.
Minimum number of records per batch inference job for Claude 3 Opus	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3 Opus.
Minimum number of records per batch inference job for Claude 3 Sonnet	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3 Sonnet.
Minimum number of records per batch inference job for Claude 3.5 Haiku	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3.5 Haiku.
Minimum number of records per batch inference job for Claude 3.5 Sonnet	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3.5 Sonnet.
Minimum number of records per batch inference job for Claude 3.5 Sonnet v2	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Claude 3.5 Sonnet v2.
Minimum number of records per batch inference job for Llama 3.1 405B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.1 405B Instruct.
Minimum number of records per batch inference job for Llama 3.1 70B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.1 70B Instruct.
Minimum number of records per batch inference job for Llama 3.1 8B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.1 8B Instruct.
Minimum number of records per batch inference job for Llama 3.2 11B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.2 11B Instruct.
Minimum number of records per batch inference job for Llama 3.2 1B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job Llama 3.2 1B Instruct.
Minimum number of records per batch inference job for Llama 3.2 3B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.2 3B Instruct.
Minimum number of records per batch inference job for Llama 3.2 90B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.2 90B Instruct.
Minimum number of records per batch inference job for Llama 3.3 70B Instruct	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Llama 3.3 70B Instruct.
Minimum number of records per batch inference job for Mistral Large 2 (24.07)	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Mistral Large 2 (24.07).
Minimum number of records per batch inference job for Mistral Small	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Mistral Small.
Minimum number of records per batch inference job for Nova Lite V1	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Nova Lite V1.
Minimum number of records per batch inference job for Nova Micro V1	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Nova Micro V1.
Minimum number of records per batch inference job for Nova Pro V1	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Nova Pro V1.
Minimum number of records per batch inference job for Titan Multimodal Embeddings G1	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Titan Multimodal Embeddings G1.
Minimum number of records per batch inference job for Titan Text Embeddings V2	Each supported Region: 100	No	The minimum number of records across all input files in a batch inference job for Titan Text Embeddings V2.
Model units no-commitment Provisioned Throughputs across base models	Each supported Region: 2	Yes	The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for base models
Model units no-commitment Provisioned Throughputs across custom models	Each supported Region: 2	Yes	The maximum number of model units that can be distributed across no-commitment Provisioned Throughputs for custom models
Model units per provisioned model for AI21 Labs Jurassic-2 Mid	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Mid.
Model units per provisioned model for AI21 Labs Jurassic-2 Ultra	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for AI21 Labs Jurassic-2 Ultra.
Model units per provisioned model for Amazon Nova Canvas	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Nova Canvas.
Model units per provisioned model for Amazon Titan Embeddings G1 - Text	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Embeddings G1 - Text.
Model units per provisioned model for Amazon Titan Image Generator G1	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G1.
Model units per provisioned model for Amazon Titan Image Generator G2	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Image Generator G2.
Model units per provisioned model for Amazon Titan Lite V1 4K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Lite V1 4K.
Model units per provisioned model for Amazon Titan Multimodal Embeddings G1	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Multimodal Embeddings G1.
Model units per provisioned model for Amazon Titan Text Embeddings V2	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Embeddings V2.
Model units per provisioned model for Amazon Titan Text G1 - Express 8K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text G1 - Express 8K.
Model units per provisioned model for Amazon Titan Text Premier V1 32K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Amazon Titan Text Premier V1 32K.
Model units per provisioned model for Anthropic Claude 3 Haiku 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 200K.
Model units per provisioned model for Anthropic Claude 3 Haiku 48K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Haiku 48K.
Model units per provisioned model for Anthropic Claude 3 Sonnet 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 200K.
Model units per provisioned model for Anthropic Claude 3 Sonnet 28K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3 Sonnet 28K.
Model units per provisioned model for Anthropic Claude 3.5 Haiku 16K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Haiku 16K.
Model units per provisioned model for Anthropic Claude 3.5 Haiku 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Haiku 200K.
Model units per provisioned model for Anthropic Claude 3.5 Haiku 64K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Haiku 64K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 18K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 18K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 200K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 51K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet 51K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 18K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet V2 18K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet V2 200K.
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 51K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.5 Sonnet V2 51K.
Model units per provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 18K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 18K.
Model units per provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 200K.
Model units per provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 51K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude 3.7 V1.0 Sonnet 51K.
Model units per provisioned model for Anthropic Claude Instant V1 100K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude Instant V1 100K.
Model units per provisioned model for Anthropic Claude V2 100K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 100K.
Model units per provisioned model for Anthropic Claude V2 18K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2 18K.
Model units per provisioned model for Anthropic Claude V2.1 18K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 18K.
Model units per provisioned model for Anthropic Claude V2.1 200K	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Anthropic Claude V2.1 200k.
Model units per provisioned model for Cohere Command	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Command.
Model units per provisioned model for Cohere Command Light	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Command Light.
Model units per provisioned model for Cohere Command R	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Command R 128k.
Model units per provisioned model for Cohere Command R Plus	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Command R Plus 128k.
Model units per provisioned model for Cohere Embed English	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Embed English.
Model units per provisioned model for Cohere Embed Multilingual	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Cohere Embed Multilingual.
Model units per provisioned model for Meta Llama 2 13B	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 13B.
Model units per provisioned model for Meta Llama 2 70B	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 70B.
Model units per provisioned model for Meta Llama 2 Chat 13B	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 13B.
Model units per provisioned model for Meta Llama 2 Chat 70B	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 2 Chat 70B.
Model units per provisioned model for Meta Llama 3 70B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 70B Instruct.
Model units per provisioned model for Meta Llama 3 8B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3 8B Instruct.
Model units per provisioned model for Meta Llama 3.1 70B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 70B Instruct.
Model units per provisioned model for Meta Llama 3.1 8B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.1 8B Instruct.
Model units per provisioned model for Meta Llama 3.2 11B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 11B Instruct.
Model units per provisioned model for Meta Llama 3.2 1B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 1B Instruct.
Model units per provisioned model for Meta Llama 3.2 3B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 3B Instruct.
Model units per provisioned model for Meta Llama 3.2 90B Instruct	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Meta Llama 3.2 90B Instruct.
Model units per provisioned model for Mistral Large 2407	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Mistral Large 2407.
Model units per provisioned model for Mistral Small	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Mistral Small.
Model units per provisioned model for Stability.ai Stable Diffusion XL 0.8	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 0.8
Model units per provisioned model for Stability.ai Stable Diffusion XL 1.0	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for Stability.ai Stable Diffusion XL 1.0.
Model units per provisioned model for the 128k context length variant for Amazon Nova Micro	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 128k context length variant for Amazon Nova Micro
Model units per provisioned model for the 24k context length variant for Amazon Nova Lite	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 24k context length variant for Amazon Nova Lite
Model units per provisioned model for the 24k context length variant for Amazon Nova Micro	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 24k context length variant for Amazon Nova Micro
Model units per provisioned model for the 24k context length variant for Amazon Nova Pro	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 24k context length variant for Amazon Nova Pro
Model units per provisioned model for the 300k context length variant for Amazon Nova Lite	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 300k context length variant for Amazon Nova Lite
Model units per provisioned model for the 300k context length variant for Amazon Nova Pro	Each supported Region: 0	Yes	The maximum number of model units that can be allotted to a provisioned model for the 300k context length variant for Amazon Nova Pro.
No-commitment model units for Provisioned Throughput created for base model Amazon Nova Canvas V1.0	Each supported Region: 1	No	The maximum number of model units that can be allocated to a Provisioned Throughput created for the base model Amazon Nova Canvas V1.0, with no commitment.
No-commitment model units for Provisioned Throughput created for custom model Amazon Nova Canvas V1 0	Each supported Region: 1	No	The maximum number of model units that can be allocated to a Provisioned Throughput created for the custom model Amazon Nova Canvas V1 0, with no commitment.
Number of concurrent automatic model evaluation jobs	Each supported Region: 20	No	The maximum number of automatic model evaluation jobs that you can specify at one time in this account in the current Region.
Number of concurrent model evaluation jobs that use human workers	Each supported Region: 10	No	The maximum number of model evaluation jobs that use human workers you can specify at one time in this account in the current Region.
Number of custom metrics	Each supported Region: 10	No	The maximum number of custom metrics that you can specify in a model evaluation job that uses human workers.
Number of custom prompt datasets in a human-based model evaluation job	Each supported Region: 1	No	The maximum number of custom prompt datasets that you can specify in a human-based model evaluation job in this account in the current Region.
Number of custom prompt routers per account	Each supported Region: 500	No	The maximum number of custom prompt routers that you can create per account per region.
Number of datasets per job	Each supported Region: 5	No	The maximum number of datasets that you can specify in an automated model evaluation job. This includes both custom and built-in prompt datasets.
Number of evaluation jobs	Each supported Region: 5,000	No	The maximum number of model evaluation jobs that you can create in this account in the current Region.
Number of metrics per dataset	Each supported Region: 3	No	The maximum number of metrics that you can specify per dataset in an automated model evaluation job. This includes both custom and built-in metrics.
Number of models in a model evaluation job that uses human workers	Each supported Region: 2	No	The maximum number of models that you can specify in a model evaluation job that uses human workers.
Number of models in automated model evaluation job	Each supported Region: 1	No	The maximum number of models that you can specify in an automated model evaluation job.
Number of prompts in a custom prompt dataset	Each supported Region: 1,000	No	The maximum number of prompts a custom prompt dataset can contain.
On-Demand, latency-optimized model inference requests per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 100	No	The maximum number of on-demand, latency-optimized requests that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-Demand, latency-optimized model inference requests per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 100	No	The maximum number of on-demand, latency-optimized requests that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-Demand, latency-optimized model inference tokens per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 40,000	No	The maximum number of on-demand, latency-optimized tokens that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-Demand, latency-optimized model inference tokens per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 40,000	No	The maximum number of on-demand, latency-optimized tokens that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand ApplyGuardrail Content filter policy text units per second	us-east-1: 200 us-west-2: 200 Each of the other supported Regions: 25	Yes	The maximum number of text units that can be processed for Content filter policies per second
On-demand ApplyGuardrail Denied topic policy text units per second	us-east-1: 50 us-west-2: 50 Each of the other supported Regions: 25	Yes	The maximum number of text units that can be processed for Denied topic policies per second
On-demand ApplyGuardrail Sensitive information filter policy text units per second	us-east-1: 200 us-west-2: 200 Each of the other supported Regions: 25	Yes	The maximum number of text units that can be processed for Sensitive information filter policies per second
On-demand ApplyGuardrail Word filter policy text units per second	us-east-1: 200 us-west-2: 200 Each of the other supported Regions: 25	Yes	The maximum number of text units that can be processed for Word filter policies per second.
On-demand ApplyGuardrail contextual grounding policy text units per second	Each supported Region: 106	Yes	The maximum number of text units that can be processed for contextual grounding policies per second
On-demand ApplyGuardrail requests per second	us-east-1: 50 us-west-2: 50 Each of the other supported Regions: 25	Yes	The maximum number of ApplyGuardrail API calls allowed per second
On-demand InvokeModel concurrent requests for Amazon Nova Reel1.0	Each supported Region: 10	No	The maximum number of concurrent model inference requests that you can submit for Amazon Nova Reel 1.0. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel concurrent requests for Amazon Nova Reel1.1	Each supported Region: 3	No	The maximum number of concurrent model inference requests that you can submit for Amazon Nova Reel 1.1. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel concurrent requests for Amazon Nova Sonic	Each supported Region: 20	No	The maximum number of concurrent requests that you can submit for model inference for Amazon Nova Sonic.
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Large	Each supported Region: 100	No	The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jamba 1.5 Mini	Each supported Region: 100	No	The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jamba Instruct	Each supported Region: 100	No	The maximum number of times that you can call model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of requests for Converse and InvokeModel
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Mid	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Mid
On-demand InvokeModel requests per minute for AI21 Labs Jurassic-2 Ultra	Each supported Region: 100	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for AI21 Labs Jurassic-2 Ultra
On-demand InvokeModel requests per minute for Amazon Nova Canvas	Each supported Region: 100	No	The maximum number of requests that you can submit for model inference in one minute for Amazon Nova Canvas. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel requests per minute for Amazon Rerank 1.0	Each supported Region: 200	No	The maximum number of times that you can call InvokeModel in one minute for Amazon Rerank 1.0.
On-demand InvokeModel requests per minute for Amazon Titan Image Generator G1	Each supported Region: 60	No	The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Image Generator G1.
On-demand InvokeModel requests per minute for Amazon Titan Multimodal Embeddings G1	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1.
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings
On-demand InvokeModel requests per minute for Amazon Titan Text Embeddings V2	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel in one minute for Amazon Titan Text Embeddings V2
On-demand InvokeModel requests per minute for Amazon Titan Text Express	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Express
On-demand InvokeModel requests per minute for Amazon Titan Text Lite	Each supported Region: 800	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Lite
On-demand InvokeModel requests per minute for Amazon Titan Text Premier	Each supported Region: 100	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Amazon Titan Text Premier
On-demand InvokeModel requests per minute for Anthropic Claude 3 Haiku	us-east-1: 1,000 us-west-2: 1,000 ap-northeast-1: 200 ap-southeast-1: 200 Each of the other supported Regions: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
On-demand InvokeModel requests per minute for Anthropic Claude 3 Sonnet	us-east-1: 500 us-west-2: 500 Each of the other supported Regions: 100	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude 3 Sonnet.
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet	us-east-1: 50 us-east-2: 50 us-west-2: 250 ap-northeast-2: 50 ap-south-1: 50 ap-southeast-2: 50 Each of the other supported Regions: 20	No	The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Sonnet V2	us-west-2: 250 Each of the other supported Regions: 50	No	The maximum number of times that you can call model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel requests per minute for Anthropic Claude Instant	us-east-1: 1,000 us-west-2: 1,000 Each of the other supported Regions: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude Instant
On-demand InvokeModel requests per minute for Anthropic Claude V2	us-east-1: 500 us-west-2: 500 Each of the other supported Regions: 100	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Anthropic Claude V2
On-demand InvokeModel requests per minute for Cohere Command	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command.
On-demand InvokeModel requests per minute for Cohere Command Light	Each supported Region: 800	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command Light.
On-demand InvokeModel requests per minute for Cohere Command R	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R 128k.
On-demand InvokeModel requests per minute for Cohere Command R Plus	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Cohere Command R Plus 128k.
On-demand InvokeModel requests per minute for Cohere Embed English	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel in one minute for Cohere Embed English.
On-demand InvokeModel requests per minute for Cohere Embed Multilingual	Each supported Region: 2,000	No	The maximum number of times that you can call InvokeModel in one minute for Cohere Embed Multilingual.
On-demand InvokeModel requests per minute for Cohere Rerank 3.5	Each supported Region: 250	No	The maximum number of times that you can call InvokeModel in one minute for Cohere Rerank 3.5.
On-demand InvokeModel requests per minute for Meta Llama 2 13B	Each supported Region: 800	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 13B.
On-demand InvokeModel requests per minute for Meta Llama 2 70B	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 70B.
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 13B	Each supported Region: 800	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 13B.
On-demand InvokeModel requests per minute for Meta Llama 2 Chat 70B	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream requests for Meta Llama 2 Chat 70B.
On-demand InvokeModel requests per minute for Mistral 7B Instruct	Each supported Region: 800	No	The maximum number of times that you can call InvokeModel in one minute for Mistral mistral-7b-instruct-v0
On-demand InvokeModel requests per minute for Mistral AI Mistral Small	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral AI Mistral Small
On-demand InvokeModel requests per minute for Mistral Large	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute for Mistral mistral-large-2402-v1
On-demand InvokeModel requests per minute for Mistral Mixtral 8x7b Instruct	Each supported Region: 400	No	The maximum number of times that you can call InvokeModel in one minute for Mistral mixtral-8x7b-v0
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Large	Each supported Region: 15	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Large.
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3 Medium	Each supported Region: 60	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion 3 Medium
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion 3.5 Large	Each supported Region: 15	No	The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream for Stability.ai Stable Diffusion 3.5 Large.
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 0.8	Each supported Region: 60	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 0.8
On-demand InvokeModel requests per minute for Stability.ai Stable Diffusion XL 1.0	Each supported Region: 60	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Diffusion XL 1.0
On-demand InvokeModel requests per minute for Stability.ai Stable Image Core	Each supported Region: 90	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Core.
On-demand InvokeModel requests per minute for Stability.ai Stable Image Ultra	Each supported Region: 10	No	The maximum number of times that you can call InvokeModel in one minute for Stability.ai Stable Image Ultra.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Large	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Large. The quota considers the combined sum of tokens for Converse and InvokeModel.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba 1.5 Mini	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba 1.5 Mini. The quota considers the combined sum of tokens for Converse and InvokeModel.
On-demand InvokeModel tokens per minute for AI21 Labs Jamba Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for AI21 Labs Jamba Instruct. The quota considers the combined sum of tokens for Converse and InvokeModel
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Mid	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Mid.
On-demand InvokeModel tokens per minute for AI21 Labs Jurassic-2 Ultra	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for AI21 Labs Jurassic-2 Ultra.
On-demand InvokeModel tokens per minute for Amazon Titan Image Generator G1	Each supported Region: 2,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Image Generator G1.
On-demand InvokeModel tokens per minute for Amazon Titan Multimodal Embeddings G1	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Multimodal Embeddings G1.
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings.
On-demand InvokeModel tokens per minute for Amazon Titan Text Embeddings V2	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Amazon Titan Text Embeddings V2.
On-demand InvokeModel tokens per minute for Amazon Titan Text Express	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Express.
On-demand InvokeModel tokens per minute for Amazon Titan Text Lite	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Lite.
On-demand InvokeModel tokens per minute for Amazon Titan Text Premier	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Amazon Titan Text Premier.
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Haiku	us-east-1: 2,000,000 us-west-2: 2,000,000 ap-northeast-1: 200,000 ap-southeast-1: 200,000 Each of the other supported Regions: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Haiku.
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Sonnet	us-east-1: 1,000,000 us-west-2: 1,000,000 Each of the other supported Regions: 200,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3 Sonnet.
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet	us-east-1: 400,000 us-east-2: 400,000 us-west-2: 2,000,000 ap-northeast-2: 400,000 ap-south-1: 400,000 ap-southeast-2: 400,000 Each of the other supported Regions: 200,000	No	The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel tokens per minute for Anthropic Claude 3.5 Sonnet V2	us-west-2: 2,000,000 Each of the other supported Regions: 400,000	No	The maximum number of tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Sonnet V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand InvokeModel tokens per minute for Anthropic Claude Instant	us-east-1: 1,000,000 us-west-2: 1,000,000 Each of the other supported Regions: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude Instant.
On-demand InvokeModel tokens per minute for Anthropic Claude V2	us-east-1: 500,000 us-west-2: 500,000 Each of the other supported Regions: 200,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude V2.
On-demand InvokeModel tokens per minute for Cohere Command	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command.
On-demand InvokeModel tokens per minute for Cohere Command Light	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Command Light.
On-demand InvokeModel tokens per minute for Cohere Command R	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R 128k.
On-demand InvokeModel tokens per minute for Cohere Command R Plus	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Cohere Command R Plus 128k.
On-demand InvokeModel tokens per minute for Cohere Embed English	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed English.
On-demand InvokeModel tokens per minute for Cohere Embed Multilingual	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel in one minute for Cohere Embed Multilingual.
On-demand InvokeModel tokens per minute for Meta Llama 2 13B	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 13B.
On-demand InvokeModel tokens per minute for Meta Llama 2 70B	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 70B.
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 13B	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 13B.
On-demand InvokeModel tokens per minute for Meta Llama 2 Chat 70B	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Meta Llama 2 Chat 70B.
On-demand InvokeModel tokens per minute for Mistral AI Mistral 7B Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral 7B Instruct.
On-demand InvokeModel tokens per minute for Mistral AI Mistral Large	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Large.
On-demand InvokeModel tokens per minute for Mistral AI Mistral Small	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral AI Mistral Small.
On-demand InvokeModel tokens per minute for Mistral AI Mixtral 8X7BB Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Mistral mixtral-8x7b-instruct-v0.
On-demand latency-optimized InvokeModel requests per minute for Anthropic Claude 3.5 Haiku	Each supported Region: 100	No	The maximum number of times that you can call InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku, if latency optimization is configured.
On-demand latency-optimized InvokeModel tokens per minute for Anthropic Claude 3.5 Haiku	Each supported Region: 500,000	No	The maximum number of tokens that you can provide through InvokeModel and InvokeModelWithResponseStream in one minute. The quota considers the combined sum of InvokeModel and InvokeModelWithResponseStream tokens for Anthropic Claude 3.5 Haiku, if latency optimization is configured.
On-demand model inference concurrent requests for Luma Ray V2	Each supported Region: 1	No	The maximum number of concurrent requests that you can submit for model inference for Luma Ray V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Amazon Nova Lite	Each supported Region: 1,000	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Amazon Nova Lite. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Amazon Nova Micro	Each supported Region: 1,000	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Amazon Nova Micro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Amazon Nova Pro	Each supported Region: 100	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Amazon Nova Pro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Amazon Titan Image Generator G1 V2	Each supported Region: 60	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Amazon Titan Image Generator G1 V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Anthropic Claude 3 Opus	Each supported Region: 50	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Anthropic Claude 3 Opus. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Anthropic Claude 3.5 Haiku	us-west-1: 400 Each of the other supported Regions: 1,000	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Anthropic Claude 3.5 Haiku. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3 70B Instruct	Each supported Region: 400	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3 8B Instruct	Each supported Region: 800	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 200	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 400	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3.1 8B Instruct	Each supported Region: 800	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3.2 11B Instruct	Each supported Region: 400	No	The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 1B Instruct	Each supported Region: 800	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Meta Llama 3.2 3B Instruct	Each supported Region: 800	No	The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.2 90B Instruct	Each supported Region: 400	No	The maximum number of times that you can call model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference requests per minute for Meta Llama 3.3 70B Instruct	Each supported Region: 400	No	The maximum number of on-demand requests that you can submit for model inference in one minute for Meta Llama 3.3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference requests per minute for Mistral Large 2407	Each supported Region: 400	No	The maximum number of times that you can call model inference in one minute for Mistral Large 2407. The quota considers the combined sum of requests for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream
On-demand model inference tokens per minute for Amazon Nova Lite	Each supported Region: 2,000,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Amazon Nova Lite. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Amazon Nova Micro	Each supported Region: 2,000,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Amazon Nova Micro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Amazon Nova Pro	Each supported Region: 400,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Amazon Nova Pro. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Amazon Titan Image Generator G1 V2	Each supported Region: 2,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Amazon Titan Image Generator G1 V2. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Anthropic Claude 3 Opus	Each supported Region: 400,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Anthropic Claude 3 Opus. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Anthropic Claude 3.5 Haiku	us-west-1: 300,000 Each of the other supported Regions: 2,000,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Anthropic Claude 3.5 Haiku. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3 70B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3 8B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3.1 405B Instruct	Each supported Region: 400,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3.1 405B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3.1 70B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3.1 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3.1 8B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3.1 8B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 11B Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 11B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 1B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3.2 1B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 3B Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 3B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.2 90B Instruct	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for Meta Llama 3.2 90B Instruct. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream.
On-demand model inference tokens per minute for Meta Llama 3.3 70B Instruct	Each supported Region: 300,000	No	The maximum number of on-demand tokens that you can submit for model inference in one minute for Meta Llama 3.3 70B Instruct. The quota considers the combined sum of Converse, ConverseStream, InvokeModel and InvokeModelWithResponseStream.
On-demand model inference tokens per minute for Mistral Large 2407	Each supported Region: 300,000	No	The maximum number of tokens that you can submit for model inference in one minute for Mistral Large 2407. The quota considers the combined sum of tokens for InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream
Output nodes per flow	Each supported Region: 20	No	The maximum number of flow output nodes.
Parameters per function	Each supported Region: 5	Yes	The maximum number of parameters that you can have in an action group function.
PrepareAgent requests per second	Each supported Region: 2	No	The maximum number of PrepareAgent API requests per second.
PrepareFlow requests per second	Each supported Region: 2	No	The maximum number of PrepareFlow requests per second.
Prompt nodes per flow	Each supported Region: 20	Yes	The maximum number of prompt nodes.
Prompts per account	Each supported Region: 500	Yes	The maximum number of prompts.
Records per batch inference job for Claude 3 Haiku	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3 Haiku.
Records per batch inference job for Claude 3 Opus	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3 Opus.
Records per batch inference job for Claude 3 Sonnet	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3 Sonnet.
Records per batch inference job for Claude 3.5 Haiku	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3.5 Haiku.
Records per batch inference job for Claude 3.5 Sonnet	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3.5 Sonnet.
Records per batch inference job for Claude 3.5 Sonnet v2	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Claude 3.5 Sonnet v2.
Records per batch inference job for Llama 3.1 405B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.1 405B Instruct.
Records per batch inference job for Llama 3.1 70B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.1 70B Instruct.
Records per batch inference job for Llama 3.1 8B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.1 8B Instruct.
Records per batch inference job for Llama 3.2 11B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.2 11B Instruct.
Records per batch inference job for Llama 3.2 1B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job Llama 3.2 1B Instruct.
Records per batch inference job for Llama 3.2 3B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.2 3B Instruct.
Records per batch inference job for Llama 3.2 90B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.2 90B Instruct.
Records per batch inference job for Llama 3.3 70B Instruct	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Llama 3.3 70B Instruct.
Records per batch inference job for Mistral Large 2 (24.07)	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Mistral Large 2 (24.07).
Records per batch inference job for Mistral Small	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Mistral Small.
Records per batch inference job for Nova Lite V1	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Nova Lite V1.
Records per batch inference job for Nova Micro V1	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Nova Micro V1.
Records per batch inference job for Nova Pro V1	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Nova Pro V1.
Records per batch inference job for Titan Multimodal Embeddings G1	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Titan Multimodal Embeddings G1.
Records per batch inference job for Titan Text Embeddings V2	Each supported Region: 50,000	Yes	The maximum number of records across all input files in a batch inference job for Titan Text Embeddings V2.
Records per input file per batch inference job for Claude 3 Haiku	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3 Haiku.
Records per input file per batch inference job for Claude 3 Opus	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3 Opus.
Records per input file per batch inference job for Claude 3 Sonnet	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3 Sonnet.
Records per input file per batch inference job for Claude 3.5 Haiku	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3.5 Haiku.
Records per input file per batch inference job for Claude 3.5 Sonnet	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3.5 Sonnet.
Records per input file per batch inference job for Claude 3.5 Sonnet v2	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Claude 3.5 Sonnet v2.
Records per input file per batch inference job for Llama 3.1 405B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.1 405B Instruct.
Records per input file per batch inference job for Llama 3.1 70B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.1 70B Instruct.
Records per input file per batch inference job for Llama 3.1 8B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.1 8B Instruct.
Records per input file per batch inference job for Llama 3.2 11B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.2 11B Instruct.
Records per input file per batch inference job for Llama 3.2 1B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job Llama 3.2 1B Instruct.
Records per input file per batch inference job for Llama 3.2 3B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.2 3B Instruct.
Records per input file per batch inference job for Llama 3.2 90B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.2 90B Instruct.
Records per input file per batch inference job for Llama 3.3 70B Instruct	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Llama 3.3 70B Instruct.
Records per input file per batch inference job for Mistral Large 2 (24.07)	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Mistral Large 2 (24.07).
Records per input file per batch inference job for Mistral Small	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Mistral Small.
Records per input file per batch inference job for Nova Lite V1	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Nova Lite V1.
Records per input file per batch inference job for Nova Micro V1	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Nova Micro V1.
Records per input file per batch inference job for Nova Pro V1	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Nova Pro V1.
Records per input file per batch inference job for Titan Multimodal Embeddings G1	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Titan Multimodal Embeddings G1.
Records per input file per batch inference job for Titan Text Embeddings V2	Each supported Region: 50,000	Yes	The maximum number of records in an input file in a batch inference job for Titan Text Embeddings V2.
Regex entities in Sensitive Information Filter	Each supported Region: 10	No	The maximum number of guardrail filter regexes that can be included in a sensitive information policy
Regex length in characters	Each supported Region: 500	No	The maximum length, in characters, of a guardrail filter regex
Retrieve requests per second	Each supported Region: 5	No	The maximum number of Retrieve API requests per second.
RetrieveAndGenerate requests per second	Each supported Region: 5	No	The maximum number of RetrieveAndGenerate API requests per second.
S3 retrieval nodes per flow	Each supported Region: 10	No	The maximum number of S3 retrieval nodes.
S3 storage nodes per flow	Each supported Region: 10	No	The maximum number of S3 storage nodes.
Scheduled customization jobs	Each supported Region: 2	No	The maximum number of scheduled customization jobs.
Size of prompt	Each supported Region: 4	No	The maximum size (in KB) of an individual prompt in a custom prompt dataset.
StartIngestionJob requests per second	Each supported Region: 0.1	No	The maximum number of StartIngestionJob API requests per second.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Haiku	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3 Haiku.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Opus	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3 Opus.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Sonnet	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3 Sonnet.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Haiku	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Haiku.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet.
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet v2	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet v2.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 405B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.1 405B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 70B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.1 70B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 8B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.1 8B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 11B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.2 11B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 1B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.2 1B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 3B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.2 3B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 90B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.2 90B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.3 70B Instruct	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Llama 3.3 70B Instruct.
Sum of in-progress and submitted batch inference jobs using a base model for Mistral Large 2 (24.07)	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Mistral Large 2 (24.07).
Sum of in-progress and submitted batch inference jobs using a base model for Mistral Small	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Mistral Small.
Sum of in-progress and submitted batch inference jobs using a base model for Nova Lite V1	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Nova Lite V1.
Sum of in-progress and submitted batch inference jobs using a base model for Nova Micro V1	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Nova Micro V1.
Sum of in-progress and submitted batch inference jobs using a base model for Nova Pro V1	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Nova Pro V1.
Sum of in-progress and submitted batch inference jobs using a base model for Titan Multimodal Embeddings G1	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Titan Multimodal Embeddings G1.
Sum of in-progress and submitted batch inference jobs using a base model for Titan Text Embeddings V2	Each supported Region: 20	Yes	The maximum number of in-progress and submitted batch inference jobs using a base model for Titan Text Embeddings V2.
Sum of in-progress and submitted batch inference jobs using a custom model for Titan Multimodal Embeddings G1	Each supported Region: 3	No	The maximum number of in-progress and submitted batch inference jobs using a custom model for Titan Multimodal Embeddings G1.
Sum of in-progress and submitted batch inference jobs using a custom model for Titan Text Embeddings V2	Each supported Region: 3	No	The maximum number of in-progress and submitted batch inference jobs using a custom model for Titan Text Embeddings V2
Sum of training and validation records for a Amazon Nova Lite Fine-tuning job	Each supported Region: 20,000	Yes	The maximum combined number of training and validation records allowed for a Amazon Nova Lite Fine-tuning job.
Sum of training and validation records for a Amazon Nova Micro Fine-tuning job	Each supported Region: 20,000	Yes	The maximum combined number of training and validation records allowed for a Amazon Nova Micro Fine-tuning job.
Sum of training and validation records for a Amazon Nova Pro Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Amazon Nova Pro Fine-tuning job.
Sum of training and validation records for a Claude 3 Haiku v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Claude 3 Haiku Fine-tuning job.
Sum of training and validation records for a Meta Llama 2 13B v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 2 13B Fine-tuning job.
Sum of training and validation records for a Meta Llama 2 70B v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 2 70B Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.1 70B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.1 70B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.1 8B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.1 8B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 11B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.2 11B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 1B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.2 1B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 3B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.2 3B Instruct Fine-tuning job.
Sum of training and validation records for a Meta Llama 3.2 90B Instruct v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Meta Llama 3.2 90B Instruct Fine-tuning job.
Sum of training and validation records for a Titan Image Generator G1 V1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Titan Image Generator Fine-tuning job.
Sum of training and validation records for a Titan Image Generator G1 V2 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Titan Image Generator V2 Fine-tuning job.
Sum of training and validation records for a Titan Multimodal Embeddings G1 v1 Fine-tuning job	Each supported Region: 50,000	Yes	The maximum combined number of training and validation records allowed for a Titan Multimodal Embeddings Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Express v1 Continued Pre-Training job	Each supported Region: 100,000	Yes	The maximum combined number of training and validation records allowed for a Titan Text Express Continued Pre-Training job.
Sum of training and validation records for a Titan Text G1 - Express v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Titan Text Express Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Lite v1 Continued Pre-Training job	Each supported Region: 100,000	Yes	The maximum combined number of training and validation records allowed for a Titan Text Lite Continued Pre-Training job.
Sum of training and validation records for a Titan Text G1 - Lite v1 Fine-tuning job	Each supported Region: 10,000	Yes	The maximum combined number of training and validation records allowed for a Titan Text Lite Fine-tuning job.
Sum of training and validation records for a Titan Text G1 - Premier v1 Fine-tuning job	Each supported Region: 20,000	Yes	The maximum combined number of training and validation records allowed for a Titan Text Premier Fine-tuning job.
Task time for workers	Each supported Region: 30	No	The maximum length (in days) of time that a worker can have to complete tasks.
Throttle rate limit for Bedrock Data Automation Runtime: ListTagsForResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation Runtime: ListTagsForResource requests you can make per second per account, in the current region
Throttle rate limit for Bedrock Data Automation Runtime: TagResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation Runtime: TagResource requests you can make per second per account, in the current region
Throttle rate limit for Bedrock Data Automation Runtime: UntagResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation Runtime: UntagResource requests you can make per second per account, in the current region
Throttle rate limit for Bedrock Data Automation: ListTagsForResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation: ListTagsForResource requests you can make per second per account, in the current region
Throttle rate limit for Bedrock Data Automation: TagResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation: TagResource requests you can make per second per account, in the current region
Throttle rate limit for Bedrock Data Automation: UntagResource	Each supported Region: 25	No	The maximum number of Bedrock Data Automation: UntagResource requests you can make per second per account, in the current region
Throttle rate limit for CreateBlueprint	Each supported Region: 5	No	No Description Available
Throttle rate limit for CreateBlueprintVersion	Each supported Region: 5	No	No Description Available
Throttle rate limit for CreateDataAutomationProject	Each supported Region: 5	No	No Description Available
Throttle rate limit for DeleteBlueprint	Each supported Region: 5	No	No Description Available
Throttle rate limit for DeleteDataAutomationProject	Each supported Region: 5	No	No Description Available
Throttle rate limit for GetBlueprint	Each supported Region: 5	No	No Description Available
Throttle rate limit for GetDataAutomationProject	Each supported Region: 5	No	No Description Available
Throttle rate limit for GetDataAutomationStatus	Each supported Region: 10	No	No Description Available
Throttle rate limit for InvokeDataAutomationAsync	Each supported Region: 10	No	No Description Available
Throttle rate limit for ListBlueprints	Each supported Region: 5	No	No Description Available
Throttle rate limit for ListDataAutomationProjects	Each supported Region: 5	No	No Description Available
Throttle rate limit for UpdateBlueprint	Each supported Region: 5	No	No Description Available
Throttle rate limit for UpdateDataAutomationProject	Each supported Region: 5	No	No Description Available
Topics per guardrail	Each supported Region: 30	No	The maximum number of topics that can be defined across guardrail topic policies
Total nodes per flow	Each supported Region: 40	No	The maximum number of nodes in a flow.
UpdateAgent requests per second	Each supported Region: 4	No	The maximum number of UpdateAgent API requests per second.
UpdateAgentActionGroup requests per second	Each supported Region: 6	No	The maximum number of UpdateAgentActionGroup API requests per second
UpdateAgentAlias requests per second	Each supported Region: 2	No	The maximum number of UpdateAgentAlias API requests per second.
UpdateAgentKnowledgeBase requests per second	Each supported Region: 4	No	The maximum number of UpdateAgentKnowledgeBase API requests per second.
UpdateDataSource requests per second	Each supported Region: 2	No	The maximum number of UpdateDataSource API requests per second.
UpdateFlow requests per second	Each supported Region: 2	No	The maximum number of UpdateFlow requests per second.
UpdateFlowAlias requests per second	Each supported Region: 2	No	The maximum number of UpdateFlowAlias requests per second.
UpdateKnowledgeBase requests per second	Each supported Region: 2	No	The maximum number of UpdateKnowledgeBase API requests per second.
UpdatePrompt requests per second	Each supported Region: 2	No	The maximum number of UpdatePrompt requests per second.
User query size	Each supported Region: 1,000	No	The maximum size (in characters) of a user query.
ValidateFlowDefinition requests per second	Each supported Region: 2	No	The maximum number of ValidateFlowDefinition requests per second.
Versions per guardrail	Each supported Region: 20	No	The maximum number of versions that a guardrail can have
Versions per prompt	Each supported Region: 10	No	The maximum number of versions per prompt.
Word length in characters	Each supported Region: 100	No	The maximum length of a word, in characters, in a blocked word list
Words per word policy	Each supported Region: 10,000	No	The maximum number of words that can be included in a blocked word list

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

AWS Batch

Billing and Cost Management

Next topic:

Billing and Cost Management

Previous topic:

AWS Batch

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences