与 AWS SDK或GetDocumentAnalysis一起使用 CLI - AWS SDK代码示例

AWS 文档 AWS SDK示例 GitHub 存储库中还有更多SDK示例

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

与 AWS SDK或GetDocumentAnalysis一起使用 CLI

以下代码示例演示如何使用 GetDocumentAnalysis

操作示例是大型程序的代码摘录,必须在上下文中运行。在以下代码示例中,您可以查看此操作的上下文:

CLI
AWS CLI

获取对多页文档进行异步文本分析的结果

以下 get-document-analysis 示例演示了如何获取对多页文档进行异步文本分析的结果。

aws textract get-document-analysis \ --job-id df7cf32ebbd2a5de113535fcf4d921926a701b09b4e7d089f3aebadb41e0712b \ --max-results 1000

输出:

{ "Blocks": [ { "Geometry": { "BoundingBox": { "Width": 1.0, "Top": 0.0, "Left": 0.0, "Height": 1.0 }, "Polygon": [ { "Y": 0.0, "X": 0.0 }, { "Y": 0.0, "X": 1.0 }, { "Y": 1.0, "X": 1.0 }, { "Y": 1.0, "X": 0.0 } ] }, "Relationships": [ { "Type": "CHILD", "Ids": [ "75966e64-81c2-4540-9649-d66ec341cd8f", "bb099c24-8282-464c-a179-8a9fa0a057f0", "5ebf522d-f9e4-4dc7-bfae-a288dc094595" ] } ], "BlockType": "PAGE", "Id": "247c28ee-b63d-4aeb-9af0-5f7ea8ba109e", "Page": 1 } ], "NextToken": "cY1W3eTFvoB0cH7YrKVudI4Gb0H8J0xAYLo8xI/JunCIPWCthaKQ+07n/ElyutsSy0+1VOImoTRmP1zw4P0RFtaeV9Bzhnfedpx1YqwB4xaGDA==", "DocumentMetadata": { "Pages": 1 }, "JobStatus": "SUCCEEDED" }

有关更多信息,请参阅《Amazon Textract 开发人员指南》中的“检测和分析多页文档中的文本”

Python
SDK适用于 Python (Boto3)
注意

还有更多相关信息 GitHub。查找完整示例,学习如何在 AWS 代码示例存储库中进行设置和运行。

class TextractWrapper: """Encapsulates Textract functions.""" def __init__(self, textract_client, s3_resource, sqs_resource): """ :param textract_client: A Boto3 Textract client. :param s3_resource: A Boto3 Amazon S3 resource. :param sqs_resource: A Boto3 Amazon SQS resource. """ self.textract_client = textract_client self.s3_resource = s3_resource self.sqs_resource = sqs_resource def get_analysis_job(self, job_id): """ Gets data for a previously started detection job that includes additional elements. :param job_id: The ID of the job to retrieve. :return: The job data, including a list of blocks that describe elements detected in the image. """ try: response = self.textract_client.get_document_analysis(JobId=job_id) job_status = response["JobStatus"] logger.info("Job %s status is %s.", job_id, job_status) except ClientError: logger.exception("Couldn't get data for job %s.", job_id) raise else: return response
  • 有关API详细信息,请参阅GetDocumentAnalysis中的 AWS SDKPython (Boto3) API 参考。

SAP ABAP
SDK对于 SAP ABAP
注意

还有更多相关信息 GitHub。查找完整示例,学习如何在 AWS 代码示例存储库中进行设置和运行。

"Gets the results for an Amazon Textract" "asynchronous operation that analyzes text in a document." TRY. oo_result = lo_tex->getdocumentanalysis( iv_jobid = iv_jobid ). "oo_result is returned for testing purposes." WHILE oo_result->get_jobstatus( ) <> 'SUCCEEDED'. IF sy-index = 10. EXIT. "Maximum 300 seconds. ENDIF. WAIT UP TO 30 SECONDS. oo_result = lo_tex->getdocumentanalysis( iv_jobid = iv_jobid ). ENDWHILE. DATA(lt_blocks) = oo_result->get_blocks( ). LOOP AT lt_blocks INTO DATA(lo_block). IF lo_block->get_text( ) = 'INGREDIENTS: POWDERED SUGAR* (CANE SUGAR,'. MESSAGE 'Found text in the doc: ' && lo_block->get_text( ) TYPE 'I'. ENDIF. ENDLOOP. MESSAGE 'Document analysis retrieved.' TYPE 'I'. CATCH /aws1/cx_texaccessdeniedex. MESSAGE 'You do not have permission to perform this action.' TYPE 'E'. CATCH /aws1/cx_texinternalservererr. MESSAGE 'Internal server error.' TYPE 'E'. CATCH /aws1/cx_texinvalidjobidex. MESSAGE 'Job ID is not valid.' TYPE 'E'. CATCH /aws1/cx_texinvalidkmskeyex. MESSAGE 'AWS KMS key is not valid.' TYPE 'E'. CATCH /aws1/cx_texinvalidparameterex. MESSAGE 'Request has non-valid parameters.' TYPE 'E'. CATCH /aws1/cx_texinvalids3objectex. MESSAGE 'Amazon S3 object is not valid.' TYPE 'E'. CATCH /aws1/cx_texprovthruputexcdex. MESSAGE 'Provisioned throughput exceeded limit.' TYPE 'E'. CATCH /aws1/cx_texthrottlingex. MESSAGE 'The request processing exceeded the limit.' TYPE 'E'. ENDTRY.