StartDocumentClassificationJob搭配使用 AWS SDK或 CLI - Amazon Comprehend

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

StartDocumentClassificationJob搭配使用 AWS SDK或 CLI

以下代码示例演示如何使用 StartDocumentClassificationJob

操作示例是大型程序的代码摘录,必须在上下文中运行。在以下代码示例中,您可以查看此操作的上下文:

CLI
AWS CLI

列出文档分类作业

以下 start-document-classification-job 示例以自定义模型启动文档分类作业,该作业对 --input-data-config 标签所指定地址处的所有文件都使用自定义模型。在此示例中,输入 S3 存储桶包含 SampleSMStext1.txtSampleSMStext2.txt、和 SampleSMStext3.txt。该模型之前曾接受过关于垃圾邮件和非垃圾邮件或 “ham” SMS 邮件的文档分类的训练。作业完成后,output.tar.gz 将放置在 --output-data-config 标签指定的位置。output.tar.gz 包含 predictions.jsonl,其中列出了每个文档的分类。Json 输出在每个文件的一行上打印,但是为了便于阅读,此处设置了格式。

aws comprehend start-document-classification-job \ --job-name exampleclassificationjob \ --input-data-config "S3Uri=s3://DOC-EXAMPLE-BUCKET-INPUT/jobdata/" \ --output-data-config "S3Uri=s3://DOC-EXAMPLE-DESTINATION-BUCKET/testfolder/" \ --data-access-role-arn arn:aws:iam::111122223333:role/service-role/AmazonComprehendServiceRole-example-role \ --document-classifier-arn arn:aws:comprehend:us-west-2:111122223333:document-classifier/mymodel/version/12

SampleSMStext1.txt 的内容:

"CONGRATULATIONS! TXT 2155550100 to win $5000"

SampleSMStext2.txt 的内容:

"Hi, when do you want me to pick you up from practice?"

SampleSMStext3.txt 的内容:

"Plz send bank account # to 2155550100 to claim prize!!"

输出:

{ "JobId": "e758dd56b824aa717ceab551fEXAMPLE", "JobArn": "arn:aws:comprehend:us-west-2:111122223333:document-classification-job/e758dd56b824aa717ceab551fEXAMPLE", "JobStatus": "SUBMITTED" }

predictions.jsonl 的内容:

{"File": "SampleSMSText1.txt", "Line": "0", "Classes": [{"Name": "spam", "Score": 0.9999}, {"Name": "ham", "Score": 0.0001}]} {"File": "SampleSMStext2.txt", "Line": "0", "Classes": [{"Name": "ham", "Score": 0.9994}, {"Name": "spam", "Score": 0.0006}]} {"File": "SampleSMSText3.txt", "Line": "0", "Classes": [{"Name": "spam", "Score": 0.9999}, {"Name": "ham", "Score": 0.0001}]}

有关更多信息,请参阅《Amazon Comprehend 开发人员指南》中的自定义分类

Python
SDK适用于 Python (Boto3)
注意

还有更多相关信息 GitHub。查找完整的示例,学习如何设置和运行 AWS 代码示例存储库

class ComprehendClassifier: """Encapsulates an Amazon Comprehend custom classifier.""" def __init__(self, comprehend_client): """ :param comprehend_client: A Boto3 Comprehend client. """ self.comprehend_client = comprehend_client self.classifier_arn = None def start_job( self, job_name, input_bucket, input_key, input_format, output_bucket, output_key, data_access_role_arn, ): """ Starts a classification job. The classifier must be trained or the job will fail. Input is read from the specified Amazon S3 input bucket and written to the specified output bucket. Output data is stored in a tar archive compressed in gzip format. The job runs asynchronously, so you can call `describe_document_classification_job` to get job status until it returns a status of SUCCEEDED. :param job_name: The name of the job. :param input_bucket: The Amazon S3 bucket that contains input data. :param input_key: The prefix used to find input data in the input bucket. If multiple objects have the same prefix, all of them are used. :param input_format: The format of the input data, either one document per file or one document per line. :param output_bucket: The Amazon S3 bucket where output data is written. :param output_key: The prefix prepended to the output data. :param data_access_role_arn: The Amazon Resource Name (ARN) of a role that grants Comprehend permission to read from the input bucket and write to the output bucket. :return: Information about the job, including the job ID. """ try: response = self.comprehend_client.start_document_classification_job( DocumentClassifierArn=self.classifier_arn, JobName=job_name, InputDataConfig={ "S3Uri": f"s3://{input_bucket}/{input_key}", "InputFormat": input_format.value, }, OutputDataConfig={"S3Uri": f"s3://{output_bucket}/{output_key}"}, DataAccessRoleArn=data_access_role_arn, ) logger.info( "Document classification job %s is %s.", job_name, response["JobStatus"] ) except ClientError: logger.exception("Couldn't start classification job %s.", job_name) raise else: return response

有关完整列表 AWS SDK开发者指南和代码示例,请参阅将 Amazon Comprehend 与 SDK 配合 AWS 使用。本主题还包括有关入门的信息以及有关先前SDK版本的详细信息。