本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
StartDocumentClassificationJob
搭配 AWS SDK或 使用 CLI
下列程式碼範例示範如何使用 StartDocumentClassificationJob
。
動作範例是大型程式的程式碼摘錄,必須在內容中執行。您可以在下列程式碼範例的內容中看到此動作:
- CLI
-
- AWS CLI
-
啟動文件分類任務
下列
start-document-classification-job
範例會啟動文件分類任務,在--input-data-config
標籤指定的地址的所有檔案上具有自訂模型。在此範例中,輸入 S3 儲存貯體包含SampleSMStext1.txt
、SampleSMStext2.txt
和SampleSMStext3.txt
。此模型先前已針對垃圾郵件和非垃圾郵件的文件分類,或「ham」SMS訊息進行訓練。當任務完成時,output.tar.gz
會放在--output-data-config
標籤指定的位置。output.tar.gz
包含predictions.jsonl
,其中列出每個文件的分類。Json 輸出會列印在每個檔案的一行上,但此處的格式為可讀性。aws comprehend start-document-classification-job \ --job-name
exampleclassificationjob
\ --input-data-config"S3Uri=s3://amzn-s3-demo-bucket-INPUT/jobdata/"
\ --output-data-config"S3Uri=s3://amzn-s3-demo-destination-bucket/testfolder/"
\ --data-access-role-arnarn:aws:iam::111122223333:role/service-role/AmazonComprehendServiceRole-example-role
\ --document-classifier-arnarn:aws:comprehend:us-west-2:111122223333:document-classifier/mymodel/version/12
SampleSMStext1.txt
的內容:"CONGRATULATIONS! TXT 2155550100 to win $5000"
SampleSMStext2.txt
的內容:"Hi, when do you want me to pick you up from practice?"
SampleSMStext3.txt
的內容:"Plz send bank account # to 2155550100 to claim prize!!"
輸出:
{ "JobId": "e758dd56b824aa717ceab551fEXAMPLE", "JobArn": "arn:aws:comprehend:us-west-2:111122223333:document-classification-job/e758dd56b824aa717ceab551fEXAMPLE", "JobStatus": "SUBMITTED" }
predictions.jsonl
的內容:{"File": "SampleSMSText1.txt", "Line": "0", "Classes": [{"Name": "spam", "Score": 0.9999}, {"Name": "ham", "Score": 0.0001}]} {"File": "SampleSMStext2.txt", "Line": "0", "Classes": [{"Name": "ham", "Score": 0.9994}, {"Name": "spam", "Score": 0.0006}]} {"File": "SampleSMSText3.txt", "Line": "0", "Classes": [{"Name": "spam", "Score": 0.9999}, {"Name": "ham", "Score": 0.0001}]}
如需詳細資訊,請參閱《Amazon Comprehend 開發人員指南》中的自訂分類。
-
如需API詳細資訊,請參閱《 AWS CLI 命令參考》StartDocumentClassificationJob
中的 。
-
- Python
-
- SDK for Python (Boto3)
-
注意
還有更多功能 GitHub。尋找完整範例,並了解如何在 AWS 程式碼範例儲存庫
中設定和執行。 class ComprehendClassifier: """Encapsulates an Amazon Comprehend custom classifier.""" def __init__(self, comprehend_client): """ :param comprehend_client: A Boto3 Comprehend client. """ self.comprehend_client = comprehend_client self.classifier_arn = None def start_job( self, job_name, input_bucket, input_key, input_format, output_bucket, output_key, data_access_role_arn, ): """ Starts a classification job. The classifier must be trained or the job will fail. Input is read from the specified Amazon S3 input bucket and written to the specified output bucket. Output data is stored in a tar archive compressed in gzip format. The job runs asynchronously, so you can call `describe_document_classification_job` to get job status until it returns a status of SUCCEEDED. :param job_name: The name of the job. :param input_bucket: The Amazon S3 bucket that contains input data. :param input_key: The prefix used to find input data in the input bucket. If multiple objects have the same prefix, all of them are used. :param input_format: The format of the input data, either one document per file or one document per line. :param output_bucket: The Amazon S3 bucket where output data is written. :param output_key: The prefix prepended to the output data. :param data_access_role_arn: The Amazon Resource Name (ARN) of a role that grants Comprehend permission to read from the input bucket and write to the output bucket. :return: Information about the job, including the job ID. """ try: response = self.comprehend_client.start_document_classification_job( DocumentClassifierArn=self.classifier_arn, JobName=job_name, InputDataConfig={ "S3Uri": f"s3://{input_bucket}/{input_key}", "InputFormat": input_format.value, }, OutputDataConfig={"S3Uri": f"s3://{output_bucket}/{output_key}"}, DataAccessRoleArn=data_access_role_arn, ) logger.info( "Document classification job %s is %s.", job_name, response["JobStatus"] ) except ClientError: logger.exception("Couldn't start classification job %s.", job_name) raise else: return response
-
如需API詳細資訊,請參閱 StartDocumentClassificationJob 中的 AWS SDK for Python (Boto3) API參考。
-
如需開發人員指南和程式碼範例的完整清單 AWS SDK,請參閱 使用 Amazon Comprehend 與 SDK AWS。本主題也包含入門的相關資訊,以及先前SDK版本的詳細資訊。