An Amazon Textract operation can fail if you exceed the maximum number of transactions
per second (TPS), causing the service to throttle your application, or when your connection
drops. For example, if you make too many calls to Amazon Textract operations in a short
period of time, it throttles your calls and sends a
ProvisionedThroughputExceededException
error in the operation response. For
information about Amazon Textract TPS quotas, see Amazon Textract
Quotas. To change a limit, you can access the Amazon Textract option in the
Service Quotas console.
You can manage throttling and dropped connections by automatically retrying the operation. You can specify
the number of retries by including the Config
parameter when you create the
Amazon Textract client. We recommend a retry count of 5. The AWS SDK retries an
operation the specified number of times before failing and throwing an exception. For more
information, see Error Retries and Exponential
Backoff in AWS.
Note
Automatic retries work for both synchronous and asynchronous operations. Before specifying automatic retries, make sure you have the most recent version of the AWS SDK. For more information, see Step 2: Set Up the AWS CLI and AWS SDKs.
The following example shows how to automatically retry Amazon Textract operations when you're processing multiple documents.
Prerequisites
If you haven't already:
Give a user the
AmazonTextractFullAccess
andAmazonS3ReadOnlyAccess
permissions. For more information, see Step 1: Set Up an AWS Account and Create a User.Install and configure the AWS CLI and the AWS SDKs. For more information, see Step 2: Set Up the AWS CLI and AWS SDKs.
To automatically retry operations
-
Upload multiple document images to your S3 bucket to run the Synchronous example. Upload a multi-page document to your S3 bucket and run
StartDocumentTextDetection
on it to run the Asynchronous example.For instructions, see Uploading Objects into Amazon S3 in the Amazon Simple Storage Service User Guide.
-
The following examples demonstrate how to use the
Config
parameter to automatically retry an operation. The Synchronous example calls theDetectDocumentText
operation, while the Asynchronous example calls theGetDocumentTextDetection
operation.Use the following examples to call the
DetectDocumentText
operation on the documents in your Amazon S3 bucket. Inmain
, change the value ofbucket
to your S3 bucket. Change the value ofdocuments
to the names of the document images that you uploaded in step 2.import boto3 from botocore.client import Config # Documents def process_multiple_documents(bucket, documents): config = Config(retries = dict(max_attempts = 5)) # Amazon Textract client textract = boto3.client('textract', config=config) for documentName in documents: print("\nProcessing: {}\n==========================================".format(documentName)) # Call Amazon Textract response = textract.detect_document_text( Document={ 'S3Object': { 'Bucket': bucket, 'Name': documentName } }) # Print detected text for item in response["Blocks"]: if item["BlockType"] == "LINE": print ('\033[94m' + item["Text"] + '\033[0m') def main(): bucket = "" documents = ["document-image-1.png", "document-image-2.png", "document-image-3.png", "document-image-4.png", "document-image-5.png" ] process_multiple_documents(bucket, documents) if __name__ == "__main__": main()