To detect text in, or analyze multipage lending documents, using the Analyze Lending workflow, you do the following:
-
Create the Amazon SNS topic and the Amazon SQS queue.
-
Subscribe the queue the topic.
-
Give the topic permission to send messages to the queue.
-
Start processing the document. Call
StartLendingAnalysis
operation. -
Get the completion status from the Amazon SQS queue. The example code tracks the job identifier (
JobId
) that's returned by theStart
operation. The example code only gets the results for matching job identifiers that are read from the completion status. This is important if other applications are using the same queue and topic. For simplicity, the example code deletes jobs that don't match. Consider adding the deleted jobs to an Amazon SQS dead-letter queue for further investigation.The results of the StartLendingAnalysis operation can be sent to an Amazon S3 bucket of your choice by using the OutputConfig feature. If you use this feature, you may have to do some additional configuration of your User and Service Role. For information on how to let Amazon Textract send encrypted documents to your Amazon S3 bucket, see Permissions for Output Configuration.
-
Get and display the processing results by calling the
GetLendingAnalysis
operation or theGetLendingAnalysisSummary
operation. -
Once you are finished processing documents, be sure to delete the Amazon SNS topic and the Amazon SQS queue. If you need to process additional documents, you can leave the Amazon SNS topic and Amazon SQS queue as they are and reuse them for the other documents.
Performing Asynchronous Lending Analysis
The example code for this procedure is provided for Python and the AWS CLI. Before you begin, install the appropriate AWS SDK. For more information, see Step 2: Set Up the AWS CLI and AWS SDKs.
-
Configure user access to Amazon Textract, and configure Amazon Textract access to Amazon SNS. For more information, see Configuring Amazon Textract for Asynchronous Operations. To complete this procedure, you need a multipage document file in PDF format. You can skip steps 3 – 6 in the configuration instructions, because the example code creates and configures the Amazon SNS topic and Amazon SQS queue. If completing the CLI example, you don't need to set up an SQS queue.
-
Upload a multipage document file in PDF or TIFF format to your Amazon S3 bucket (you can also process single-page documents in JPEG, PNG, TIFF, or PDF formats). For instructions, see Uploading Objects into Amazon S3in the Amazon Simple Storage Service User Guide.
-
Use the following AWS SDK for Python (Boto3) or AWS CLI code to analyze text in a multipage lending document. In the main function:
-
Replace the value of
roleArn
with the IAM role ARN that you saved in Giving Amazon Textract Access to Your Amazon SNS Topic. -
Replace the values of
bucket
anddocument
with the bucket and document file name that you previously specified in the proceeding Step 2. -
Replace the value of the
type
input parameter of theProcessDocument
function with the type of processing that you want to use. For example, useProcessType.DETECTION
to detect text, or useProcessType.ANALYSIS
to analyze text. -
For the Python example, replace the value of
region_name
with the region your client is operating in.
For the upcoming AWS CLI example code, do the following:
-
When calling the StartLendingAnalysis operation, replace the value of
bucket-name
with the name of your S3 bucket, and replaceFileName
with the name of the file you specified in step 2. Specify the region of your bucket by replacingregion-name
with the name of your region. Take note that the CLI example does not make use of SQS. -
When calling the GetLendingAnalysis operation or the GetLendingAnalysisSummary operation, replace
jobId
with thejobId
returned by StartLendingAnalysis. Specify the region of your bucket by replacingregion-name
with the name of your region.
-
-
Run the code for your chosen SDK or the AWS CLI.
The operation might take a while to finish. After it's finished, a list of blocks for detected or analyzed text is displayed by the follwing examples:
To start the lending document analysis use the following CLI command. If you want to see splitted documents, use the
output-config
argument, otherwise you can remove it :aws textract start-lending-analysis \ --document-location '{"S3Object":{"Bucket":"S3Bucket","Name":"FileName"}}' \ --output-config '{"S3Bucket": "S3Bucket", "S3Prefix": "S3Prefix"}' \ --kms-key-id '1234abcd-12ab-34cd-56ef-1234567890ab' \ --region 'region-name'
To get the results of the lending document analysis use the following CLI command. The
max-results
argument is optional, and if you don't want to limit the number of results returned you can remove it:aws textract get-lending-analysis \ --job-id 'jobId' \ --region 'us-west-2' \ --max-results 30
To retrieve a summary of the results:
aws textract get-lending-analysis-summary \ --job-id 'jobId' \ --region 'us-west-2'