Guidelines and quotas
Unless otherwise specified, the Amazon Comprehend quotas are per region. You can request an increase to adjustable quotas if needed for your applications. For information about quotas and to request a quota increase, see AWS Service Quotas.
Supported Regions
Amazon Comprehend is available in the following AWS Regions:
-
US East (Ohio)
-
US East (N. Virginia)
-
US West (Oregon)
-
Asia Pacific (Mumbai)
-
Asia Pacific (Seoul)
-
Asia Pacific (Singapore)
-
Asia Pacific (Sydney)
-
Asia Pacific (Tokyo)
-
Canada (Central)
-
Europe (Frankfurt)
-
Europe (Ireland)
-
Europe (London)
-
AWS GovCloud (US-West)
By default, Amazon Comprehend provides all API operations in each of the supported regions. For exceptions, see Document processing.
For information about API endpoints, see Amazon Comprehend Regions and Endpoints in the Amazon Web Services General Reference.
To review current quotas in a region, or to request quota increases for adjustable quotas, open the Service Quotas console
Quotas for built-in models
Amazon Comprehend provides built-in models for you to analyze UTF-8 text documents. Amazon Comprehend provides synchronous and asynchronous operations that use the built-in models.
Real-time (synchronous) analysis
This section describes quotas related to real-time analysis using the built-in models.
Topics
Single document operations
The Amazon Comprehend API provides operations that take a single document as input. The following quotas apply to these operations.
General quotas for single document operations
The following quotas apply to real-time analysis for detecting entities, key-phrases, or dominant language. For entity detection, these quotas apply to detection with the built-in models. For custom entity detection, see the quotas in Custom entity recognition .
Description | Quota/Guideline |
---|---|
Maximum document size | 100 KB |
Operation-specific quotas for single document operations
The following quotas apply to real-time analysis for detecting sentiment, targeted sentiment, and syntax.
Description | Quota/Guideline |
---|---|
Maximum document size | 5 KB |
Multiple document operations
The Amazon Comprehend API provides batch operations that process multiple documents with a single API request. The following quotas apply to the batch operations.
Description | Quota/Guideline |
---|---|
Maximum document size | 5 KB |
Maximum documents per request | 25 |
For more information about using the batch document operations, see Multiple document synchronous processing.
Request throttling for real-time (synchronous) requests
Amazon Comprehend applies dynamic throttling to synchronous requests. If system processing bandwidth is available, Amazon Comprehend gradually increases the number of your requests that it processes. To control your application's usage of the synchronous API operations, we recommend that you turn on billing alerts or implement rate-limiting in your application.
Asynchronous analysis
This section describes quotas related to asynchronous analysis using the built-in models.
Asynchronous API operations each support a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.
For adjustable quotas, you can request a quota increase using the Service Quotas console
Topics
General quotas for asynchronous operations
You can run asynchronous analysis jobs using the console or any of the API Start*
operations.
For information about when to use asynchronous operations, see Asynchronous batch processing.
The following quotas apply to most of the API Start*
operations for built-in models. For the exceptions, see
Operation-specific quotas for asynchronous jobs.
Description | Quota/Guideline |
---|---|
Maximum size of each document in jobs that detect entities, key phrases, PII, and languages | 1 MB |
Maximum total size of all files in a request | 5 GB |
Minimum total size of all files in a request | 500 bytes |
Maximum number of files, one document per file | 1,000,000 |
Maximum total number of lines, one document per line | 1,000,000 |
Operation-specific quotas for asynchronous jobs
This section describes quotas for specific asynchronous operations. If a quota isn't specified in the following tables, the general quota value applies.
Sentiment
Asynchronous sentiment jobs, which you create with the StartSentimentDetectionJob operation, have the following quotas.
Description | Quota/Guideline |
---|---|
Maximum size of each input document | 5 KB |
Targeted sentiment
Asynchronous targeted sentiment jobs, which you create with the StartTargetedSentimentDetectionJob operation, have the following quotas.
Description | Quota/Guideline |
---|---|
Supported document formats | UTF-8 |
Maximum size of each document in a job | 10 KB |
Maximum size of all documents in a job | 300 MB |
Maximum number of files, one document per file | 30,000 |
Maximum total number of lines, one document per line (for all files in a request) | 30,000 |
Events
Asynchronous events detection jobs, which you create with the StartEventsDetectionJob operation, have the following quotas.
Description | Quotas |
---|---|
Character encoding | UTF-8 |
Total size of all files in a job | 50 MB |
Maximum size of each document in a job | 10 KB |
Maximum number of files, one document per file | 5,000 |
Maximum total number of lines, one document per line (for all files in request) | 5,000 |
Topic modeling
Asynchronous topic modeling jobs, which you create with the StartTopicsDetectionJob operation, have the following quotas.
Description | Quota/Guideline |
---|---|
Character encoding | UTF-8 |
Maximum number of topics to return | 100 |
Maximum file size for one file, one document per file | 100 MB |
For more information, see Topic modeling
Request throttling for asynchronous requests
Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.
For adjustable quotas, you can request a quota increase using the Service Quotas console
Quotas for custom models
You can use Amazon Comprehend to build your own custom models for custom classification and custom entity recognition. This section provides the guidelines and quotas related to training and using custom models. For more information about custom models, see Amazon Comprehend Custom.
General quotas
Amazon Comprehend sets general size quotas for each type of input document that you can analyze with custom models. For real-time analysis quotas, see Maximum document sizes for real-time analysis. For asynchronous analysis quotas, see Inputs for asynchronous custom analysis.
Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.
For adjustable quotas, you can request a quota increase using the Service Quotas console
Quotas for endpoints
You create an endpoint to run real-time analysis with a custom model. For information about endpoints, see Managing Amazon Comprehend endpoints.
The following quotas apply to the endpoints. For information about how to request a quota increase, see AWS Service Quotas.
Description | Quota/Guideline |
---|---|
Maximum number of active endpoints per Region for each account | 20 |
Maximum number of inference units per Region for each account | 200 |
Maximum number of inference units per endpoint per region | 50 |
Maximum throughput per inference unit (characters) | 100/second |
Maximum throughput per inference unit (documents) | 2/second |
Document classification
This section describes the guidelines and quotas for the following document classification operations:
-
Classifier training jobs that you start with the CreateDocumentClassifier operation.
Asynchronous document classification jobs that you start with the StartDocumentClassificationJob operation.
-
Synchronous document classification requests that use the ClassifyDocument operation.
General quotas for document classification
The following table describes general quotas related to training custom classifiers.
Description | Quota/Guideline |
---|---|
Maximum length of class name | 5,000 characters |
Number of classes (multi-class mode) | 2–1,000 |
Number of classes (multi-label mode) | 2–100 |
Annotations format | |
Minimum number of annotations per class (multi-class mode) | 10 |
Minimum number of annotations per class (multi-label mode) | 10 |
Minimum number of annotations (multi-label mode) | 50 |
CSV file format | |
Minimum number of training documents per class (multi-class mode) | 50 |
Minimum number of training documents per class (multi-label mode) | 10 |
Minimum number of training documents (multi-label mode) | 50 |
Classification for plain text documents
You create and train a plain-text model using plain-text input documents. Amazon Comprehend provides real-time and asynchronous operations to classify plain text documents using a plain-text model.
Training
The following table describes quotas related to training a custom classifier with plain text documents.
Description | Quota/Guideline |
---|---|
Total size of all files in training job | 5 GB |
Maximum number of augmented manifest files for training a custom classifier | 5 |
Maximum number of attribute names for each augmented manifest file | 5 |
Maximum length of attribute name | 63 characters |
Real-time (synchronous) analysis
The following table describes quotas related to real-time classification of plain text documents.
Description | Quota/Guideline |
---|---|
Maximum number of documents per synchronous request | 1 |
Maximum text document size (UTF-8 encoded) | 10 KB |
Asynchronous analysis
The following table describes quotas related to asynchronous classification of plain text documents.
Description | Quota/Guideline |
---|---|
Total size of all files in asynchronous job | 5 GB |
Maximum file size for one file, one document per file | 10 MB |
Maximum number of files, one document per file | 1,000,000 |
Maximum total number of lines, one document per line (for all files in request) | 1,000,000 |
Classification for semi-structured documents
This section describes the guidelines and quotas for document classification of semi-structured documents. To classify semi-structured documents, use a native document model that you trained with native input documents.
Training a native document model with semi-structured docs
The following table describes quotas related to training a custom classifier with semi-structured documents, such as PDF documents, Word documents, and image files.
Description | Quota/Guideline |
---|---|
Maximum number of pages across all documents | 10,000 |
Maximum annotations file size (all CSV file sizes combined) | 5 MB |
Document corpus size (training and test documents) | 10 GB |
File sizes for training and testing files | |
Image file size (JPG, PNG, TIFF). | 1 byte–10 MB. TIFF files: one page maximum. |
Page size for PDF documents | 1 byte–10 MB |
Page size for Word documents | 1 byte–10 MB |
Amazon Textract API output JSON size | 1 byte–1 MB |
Real-time (synchronous) analysis
This section describes quotas related to real-time classification of semi-structured documents.
The following table shows the maximum file sizes for input documents. For all input document types, the input file maximum is one page, with no more than 10,000 characters.
File type | Maximum size (API) | Maximum size (console) |
---|---|---|
UTF-8 text documents | 10 KB | 10 KB |
PDF documents | 10 MB | 5 MB |
Word documents | 10 MB | 5 MB |
Image files | 10 MB | 5 MB |
Amazon Textract API output size | 1 MB | n/a |
Asynchronous analysis
The following table describes quotas related to asynchronous classification of semi-structured documents.
Description | Quota/Guideline |
---|---|
Maximum number of pages across all input documents for a job | 25,000 |
Document corpus size | 25 GB |
Image file size (JPG, PNG, or TIFF) | 1 byte–10 MB. TIFF files: one page maximum. |
Page size for PDF documents | 1 byte–10 MB |
Page size for Word documents | 1 byte–10 MB |
Textract API output JSON size | 1 byte–1 MB. |
Custom entity recognition
This section describes the guidelines and quotas for the following operations for custom entity recognition:
Entity recognizer training jobs started with the CreateEntityRecognizer operation.
Asynchronous entity recognition jobs started with the StartEntitiesDetectionJob operation.
Synchronous entity recognition requests using the DetectEntities operation.
Custom entity recognition for plain text documents
Amazon Comprehend provides async and sync operations to analyze plain text documents with a custom entity recognizer.
Training
This section describes quotas related to training a custom entity recognizer to analyze plain text documents. To train the model, you can provide an entity list or a set of annotated text documents.
The following table describes quotas related to training the model with an entity list.
Description | Quota/Guideline |
---|---|
Number of entities per model | 1–25 |
Document size (UTF-8) | 1–5,000 byte |
Number of items in entity list | 1–1 million |
Length of individual entry (post-strip) in entry list | 1–5,000 |
Entity list corpus size (all docs in plaintext combined) | 5 KB –200 MB |
The following table describes quotas related to training the model with annotated text documents.
Description | Quota/Guideline |
---|---|
Number of entities per model/custom entity recognizer | 1–25 |
Document size (UTF-8) | 1–5,000 byte |
Number of documents (see Plain-text annotations) | 3–200,000 |
Document corpus size (all docs in plaintext combined) | 5 KB - 200 MB |
Minimum number of annotations per entity | 25 |
Real-time (synchronous) analysis
The following table describes quotas related to real-time analysis of plain text documents.
Description | Quota/Guideline |
---|---|
Maximum number of documents per synchronous request | 1 |
Maximum text document size (UTF-8 encoded) | 5 KB |
Asynchronous analysis
The following table describes quotas related to asynchronous entity recognition of plain text documents.
Description | Quota/Guideline |
---|---|
Document size (UTF-8) | 1 byte–1 MB |
Maximum number of files, one document per file | 1,000,000 |
Maximum total number of lines, one document per line (for all files in request) | 1,000,000 |
Document corpus size (all docs in plaintext combined) | 1 byte–5 GB |
Custom entity recognition for semi-structured documents
Amazon Comprehend provides async and sync operations to analyze semi-structured documents with a custom entity recognizer. You must train the model using annotated PDF documents.
Training
The following table describes quotas related to training a custom entity recognizer (CreateEntityRecognizer) to analyze semi-structured documents.
Description | Quota/Guideline |
---|---|
Number of entities per model/custom entity recognizer | 1–25 |
Maximum annotation file size (UTF-8 JSON) | 5 MB |
Number of documents | 250–10,000 |
Document corpus size (all docs in plaintext combined) | 5 KB–1 GB |
Minimum number of annotations per entity | 100 |
Maximum number of augmented manifest files for training a custom entity recognizer | 5 |
Maximum number of attribute names for each augmented manifest file | 5 |
Maximum length of attribute name | 63 characters |
Real-time (synchronous) analysis
This section describes quotas related to real-time analysis of semi-structured documents.
The following table shows the maximum file sizes for input documents. For all input document types, the input file maximum is one page, with no more than 10,000 characters.
File type | Maximum size (API) | Maximum size (console) |
---|---|---|
UTF-8 text documents | 10 KB | 10 KB |
PDF documents | 10 MB | 5 MB |
Word documents | 10 MB | 5 MB |
Image files | 10 MB | 5 MB |
Textract output files | 1 MB | n/a |
Asynchronous analysis
This section describes quotas for asynchronous analysis of semi-structured documents.
Description | Quota/Guideline |
---|---|
Image size (JPG or PNG) | 1 byte–10 MB |
Image size (TIFF) | 1 byte–10 MB. Maximum one page. |
Document size (PDF) | 1 byte–50 MB |
Document size (Docx) | 1 byte–5 MB |
Document size (UTF-8) | 1 byte–1 MB |
Maximum number of files, one document per file (one document per line not allowed for image files or PDF/Word documents) | 500 |
Maximum number of pages for a PDF or Docx file | 100 |
Document corpus size after text extraction (plaintext, all files combined) | 1 byte–5 GB |
For more information about limits for images, see Hard Limits in Amazon Textract
Quotas for flywheels
Use flywheels to manage training and tracking of custom model versions for custom classification and custom entity recognition. For more information about Flywheels, see Flywheels.
General quotas for flywheels
The follow quotas apply to flywheels and flywheel iterations.
Description | Quota/Guideline |
---|---|
Maximum number of flywheels | 50 |
Maximum number of flywheels in CREATING state | 10 |
Maximum number of training datasets per flywheel | 50 |
Maximum number of test datasets per flywheel | 50 |
Maximum number of datasets with INGESTING status | 10 |
Maximum number of in-progress flywheel iterations per account | 10 |
Dataset quotas for custom classification models
When you ingest a dataset for a flywheel associated with a custom classification model, the following quotas apply.
Description | Quota/Guideline |
---|---|
Minimum number of training documents per class (multi-label mode) | 50 |
Maximum number of training documents | 1,000,000 |
Minimum dataset size | 500 bytes |
Maximum dataset size | 5 GB |
Maximum file size for one file, one document per file | 10 MB |
Dataset quotas for custom entity recognition models
When you ingest a dataset for a flywheel associated with a custom entity recognition model, the following quotas apply.
Description | Quota/Guideline |
---|---|
Maximum document size | 5 KB |
Minimum number of training documents | 3 |
Maximum number of training documents | 200,000 |
Minimum number of annotations per entity | 25 |
Maximum dataset size | 200 MB |