本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
StartTopicsDetectionJob
搭配使用 AWS SDK或 CLI
以下代码示例演示如何使用 StartTopicsDetectionJob
。
操作示例是大型程序的代码摘录,必须在上下文中运行。在以下代码示例中,您可以查看此操作的上下文:
- .NET
-
- AWS SDK for .NET
-
注意
还有更多相关信息 GitHub。查找完整的示例,学习如何设置和运行 AWS 代码示例存储库
。 using System; using System.Threading.Tasks; using Amazon.Comprehend; using Amazon.Comprehend.Model; /// <summary> /// This example scans the documents in an Amazon Simple Storage Service /// (Amazon S3) bucket and analyzes it for topics. The results are stored /// in another bucket and then the resulting job properties are displayed /// on the screen. This example was created using the AWS SDK for .NEt /// version 3.7 and .NET Core version 5.0. /// </summary> public static class TopicModeling { /// <summary> /// This methos calls a topic detection job by calling the Amazon /// Comprehend StartTopicsDetectionJobRequest. /// </summary> public static async Task Main() { var comprehendClient = new AmazonComprehendClient(); string inputS3Uri = "s3://input bucket/input path"; InputFormat inputDocFormat = InputFormat.ONE_DOC_PER_FILE; string outputS3Uri = "s3://output bucket/output path"; string dataAccessRoleArn = "arn:aws:iam::account ID:role/data access role"; int numberOfTopics = 10; var startTopicsDetectionJobRequest = new StartTopicsDetectionJobRequest() { InputDataConfig = new InputDataConfig() { S3Uri = inputS3Uri, InputFormat = inputDocFormat, }, OutputDataConfig = new OutputDataConfig() { S3Uri = outputS3Uri, }, DataAccessRoleArn = dataAccessRoleArn, NumberOfTopics = numberOfTopics, }; var startTopicsDetectionJobResponse = await comprehendClient.StartTopicsDetectionJobAsync(startTopicsDetectionJobRequest); var jobId = startTopicsDetectionJobResponse.JobId; Console.WriteLine("JobId: " + jobId); var describeTopicsDetectionJobRequest = new DescribeTopicsDetectionJobRequest() { JobId = jobId, }; var describeTopicsDetectionJobResponse = await comprehendClient.DescribeTopicsDetectionJobAsync(describeTopicsDetectionJobRequest); PrintJobProperties(describeTopicsDetectionJobResponse.TopicsDetectionJobProperties); var listTopicsDetectionJobsResponse = await comprehendClient.ListTopicsDetectionJobsAsync(new ListTopicsDetectionJobsRequest()); foreach (var props in listTopicsDetectionJobsResponse.TopicsDetectionJobPropertiesList) { PrintJobProperties(props); } } /// <summary> /// This method is a helper method that displays the job properties /// from the call to StartTopicsDetectionJobRequest. /// </summary> /// <param name="props">A list of properties from the call to /// StartTopicsDetectionJobRequest.</param> private static void PrintJobProperties(TopicsDetectionJobProperties props) { Console.WriteLine($"JobId: {props.JobId}, JobName: {props.JobName}, JobStatus: {props.JobStatus}"); Console.WriteLine($"NumberOfTopics: {props.NumberOfTopics}\nInputS3Uri: {props.InputDataConfig.S3Uri}"); Console.WriteLine($"InputFormat: {props.InputDataConfig.InputFormat}, OutputS3Uri: {props.OutputDataConfig.S3Uri}"); } }
-
有关API详细信息,请参阅StartTopicsDetectionJob中的 AWS SDK for .NET API参考。
-
- CLI
-
- AWS CLI
-
启动主题检测分析作业
以下
start-topics-detection-job
示例为位于--input-data-config
标签指定地址的所有文件启动异步主题检测作业。作业完成后,文件夹output
将放置在--ouput-data-config
标签指定的位置。output
包含 topic-terms.csv 和 doc-topics.csv。第一个输出文件 topic-terms.csv 是集合中的主题列表。对于每个主题,默认情况下,该列表按权重排列主题列出根据其的热门术语。第二个文件doc-topics.csv
列出了与主题相关的文档以及与该主题相关的文档比例。aws comprehend start-topics-detection-job \ --job-name
example_topics_detection_job
\ --language-codeen
\ --input-data-config"S3Uri=s3://DOC-EXAMPLE-BUCKET/"
\ --output-data-config"S3Uri=s3://DOC-EXAMPLE-DESTINATION-BUCKET/testfolder/"
\ --data-access-role-arnarn:aws:iam::111122223333:role/service-role/AmazonComprehendServiceRole-example-role
\ --language-codeen
输出:
{ "JobId": "123456abcdeb0e11022f22a11EXAMPLE", "JobArn": "arn:aws:comprehend:us-west-2:111122223333:key-phrases-detection-job/123456abcdeb0e11022f22a11EXAMPLE", "JobStatus": "SUBMITTED" }
有关更多信息,请参阅《Amazon Comprehend 开发人员指南》中的主题建模。
-
有关API详细信息,请参阅StartTopicsDetectionJob
中的 AWS CLI 命令参考。
-
- Python
-
- SDK适用于 Python (Boto3)
-
注意
还有更多相关信息 GitHub。查找完整的示例,学习如何设置和运行 AWS 代码示例存储库
。 class ComprehendTopicModeler: """Encapsulates a Comprehend topic modeler.""" def __init__(self, comprehend_client): """ :param comprehend_client: A Boto3 Comprehend client. """ self.comprehend_client = comprehend_client def start_job( self, job_name, input_bucket, input_key, input_format, output_bucket, output_key, data_access_role_arn, ): """ Starts a topic modeling job. Input is read from the specified Amazon S3 input bucket and written to the specified output bucket. Output data is stored in a tar archive compressed in gzip format. The job runs asynchronously, so you can call `describe_topics_detection_job` to get job status until it returns a status of SUCCEEDED. :param job_name: The name of the job. :param input_bucket: An Amazon S3 bucket that contains job input. :param input_key: The prefix used to find input data in the input bucket. If multiple objects have the same prefix, all of them are used. :param input_format: The format of the input data, either one document per file or one document per line. :param output_bucket: The Amazon S3 bucket where output data is written. :param output_key: The prefix prepended to the output data. :param data_access_role_arn: The Amazon Resource Name (ARN) of a role that grants Comprehend permission to read from the input bucket and write to the output bucket. :return: Information about the job, including the job ID. """ try: response = self.comprehend_client.start_topics_detection_job( JobName=job_name, DataAccessRoleArn=data_access_role_arn, InputDataConfig={ "S3Uri": f"s3://{input_bucket}/{input_key}", "InputFormat": input_format.value, }, OutputDataConfig={"S3Uri": f"s3://{output_bucket}/{output_key}"}, ) logger.info("Started topic modeling job %s.", response["JobId"]) except ClientError: logger.exception("Couldn't start topic modeling job.") raise else: return response
-
有关API详细信息,请参阅StartTopicsDetectionJob中的 AWS SDK供参考 Python (Boto3) API。
-
有关完整列表 AWS SDK开发者指南和代码示例,请参阅将 Amazon Comprehend 与 SDK 配合 AWS 使用。本主题还包括有关入门的信息以及有关先前SDK版本的详细信息。