本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
异步分析任务的输出
分析任务完成后,它将结果存储到您在请求中指定的 S3 存储桶中。
文本输入的输出
无论哪种格式的文本输入文档(多类或多标签),任务输出都由一个名为 output.tar.gz
的文件组成。它是一个压缩的存档文件,其中包含一个带有输出的文本文件。
多类输出
当您使用在多类模式下训练的分类器时,结果会显示 classes
。这些类中的每一个 classes
都是在训练分类器时用来创建类别集的类。
有关这些输出字段的更多详细信息,请参阅ClassifyDocument《Amazon Comprehend API 参考》。
以下示例使用以下互斥类。
DOCUMENTARY
SCIENCE_FICTION
ROMANTIC_COMEDY
SERIOUS_DRAMA
OTHER
如果您的输入数据格式为每行一个文档,则输出文件中的每行包含输入中的一行。每行都包括文件名、输入行的从零开始的行号以及文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。
例如:
{"File": "file1.txt", "Line": "0", "Classes": [{"Name": "Documentary", "Score": 0.8642}, {"Name": "Other", "Score": 0.0381}, {"Name": "Serious_Drama", "Score": 0.0372}]} {"File": "file1.txt", "Line": "1", "Classes": [{"Name": "Science_Fiction", "Score": 0.5}, {"Name": "Science_Fiction", "Score": 0.0381}, {"Name": "Science_Fiction", "Score": 0.0372}]} {"File": "file2.txt", "Line": "2", "Classes": [{"Name": "Documentary", "Score": 0.1}, {"Name": "Documentary", "Score": 0.0381}, {"Name": "Documentary", "Score": 0.0372}]} {"File": "file2.txt", "Line": "3", "Classes": [{"Name": "Serious_Drama", "Score": 0.3141}, {"Name": "Other", "Score": 0.0381}, {"Name": "Other", "Score": 0.0372}]}
如果您的输入数据格式为每个文件一个文档,则输出文件包含每个文档一行。每行都有文件名和文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。
例如:
{"File": "file0.txt", "Classes": [{"Name": "Documentary", "Score": 0.8642}, {"Name": "Other", "Score": 0.0381}, {"Name": "Serious_Drama", "Score": 0.0372}]} {"File": "file1.txt", "Classes": [{"Name": "Science_Fiction", "Score": 0.5}, {"Name": "Science_Fiction", "Score": 0.0381}, {"Name": "Science_Fiction", "Score": 0.0372}]} {"File": "file2.txt", "Classes": [{"Name": "Documentary", "Score": 0.1}, {"Name": "Documentary", "Score": 0.0381}, {"Name": "Domentary", "Score": 0.0372}]} {"File": "file3.txt", "Classes": [{"Name": "Serious_Drama", "Score": 0.3141}, {"Name": "Other", "Score": 0.0381}, {"Name": "Other", "Score": 0.0372}]}
多标签输出
当您使用在多标签模式下训练的分类器时,结果会显示 labels
。这些标签中的每一个 labels
都是在训练分类器时用来创建类别集的标签。
以下示例使用这些独特的标签。
SCIENCE_FICTION
ACTION
DRAMA
COMEDY
ROMANCE
如果您的输入数据格式为每行一个文档,则输出文件中的每行包含输入中的一行。每行都包括文件名、输入行的从零开始的行号以及文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。
例如:
{"File": "file1.txt", "Line": "0", "Labels": [{"Name": "Action", "Score": 0.8642}, {"Name": "Drama", "Score": 0.650}, {"Name": "Science Fiction", "Score": 0.0372}]} {"File": "file1.txt", "Line": "1", "Labels": [{"Name": "Comedy", "Score": 0.5}, {"Name": "Action", "Score": 0.0381}, {"Name": "Drama", "Score": 0.0372}]} {"File": "file1.txt", "Line": "2", "Labels": [{"Name": "Action", "Score": 0.9934}, {"Name": "Drama", "Score": 0.0381}, {"Name": "Action", "Score": 0.0372}]} {"File": "file1.txt", "Line": "3", "Labels": [{"Name": "Romance", "Score": 0.9845}, {"Name": "Comedy", "Score": 0.8756}, {"Name": "Drama", "Score": 0.7723}, {"Name": "Science_Fiction", "Score": 0.6157}]}
如果您的输入数据格式为每个文件一个文档,则输出文件包含每个文档一行。每行都有文件名和文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。
例如:
{"File": "file0.txt", "Labels": [{"Name": "Action", "Score": 0.8642}, {"Name": "Drama", "Score": 0.650}, {"Name": "Science Fiction", "Score": 0.0372}]} {"File": "file1.txt", "Labels": [{"Name": "Comedy", "Score": 0.5}, {"Name": "Action", "Score": 0.0381}, {"Name": "Drama", "Score": 0.0372}]} {"File": "file2.txt", "Labels": [{"Name": "Action", "Score": 0.9934}, {"Name": "Drama", "Score": 0.0381}, {"Name": "Action", "Score": 0.0372}]} {"File": "file3.txt”, "Labels": [{"Name": "Romance", "Score": 0.9845}, {"Name": "Comedy", "Score": 0.8756}, {"Name": "Drama", "Score": 0.7723}, {"Name": "Science_Fiction", "Score": 0.6157}]}
半结构化输入文档的输出
对于半结构化输入文档,输出可以包括以下附加字段:
DocumentMetadata — 提取有关文档的信息。元数据包括文档中的页面列表,以及从每页中提取的字符数。如果请求包含
Byte
参数,则响应中会显示此字段。DocumentType -输入文档中每页的文档类型。如果请求包含
Byte
参数,则响应中会显示此字段。错误:系统在处理输入文档时检测到的页面级错误。如果系统未遇到任何错误,则该字段为空。
有关这些输出字段的更多详细信息,请参阅ClassifyDocument《Amazon Comprehend API 参考》。
以下示例显示了两页扫描PDF文件的输出。
[{ #First page output "Classes": [ { "Name": "__label__2 ", "Score": 0.9993996620178223 }, { "Name": "__label__3 ", "Score": 0.0004330444789957255 } ], "DocumentMetadata": { "PageNumber": 1, "Pages": 2 }, "DocumentType": "ScannedPDF", "File": "file.pdf", "Version": "VERSION_NUMBER" }, #Second page output { "Classes": [ { "Name": "__label__2 ", "Score": 0.9993996620178223 }, { "Name": "__label__3 ", "Score": 0.0004330444789957255 } ], "DocumentMetadata": { "PageNumber": 2, "Pages": 2 }, "DocumentType": "ScannedPDF", "File": "file.pdf", "Version": "VERSION_NUMBER" }]