Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Configuring datasets

Focus mode
Configuring datasets - Amazon Comprehend

To add labeled training or test data to a flywheel, use the Amazon Comprehend console or API to create a dataset.

You configure each dataset as training data or test data. You associate the dataset with a specific flywheel and custom model. When you create a dataset, Amazon Comprehend uploads the data to the flywheel's data lake. For details about file formats for the training data, see Preparing classifier training data or Preparing entity recognizer training data.

When you delete the flywheel, Amazon Comprehend deletes the datasets. The uploaded data remains available in the data lake.

Creating a dataset (console)

Create a dataset
  1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

  2. From the left menu, choose Flywheels and choose the flywheel where you want to add the data.

  3. Choose the Datasets tab.

  4. In the Training datasets or Test datasets table, choose Create dataset.

  5. Under Dataset details, enter a name for the dataset and an optional description.

  6. Under Data specifications, choose the Data format and the Dataset type configuration fields.

  7. (Optional) Under Input format, choose the format of the input documents.

  8. Under Annotation location on S3, enter the Amazon S3 location of the annotations file.

  9. Under Training data location on S3, enter the Amazon S3 location of the document files.

  10. Choose Create.

Creating a dataset (API)

You can use the CreateDataset operation to create a dataset.

aws comprehend create-dataset \ --flywheel-arn "myFlywheel2" \ --dataset-name "my-training-dataset" --dataset-type "TRAIN" --description "my training dataset" --cli-input-json file://inputConfig.json }

The inputConfig.json file contains the following content.

{ "DataFormat": "COMPREHEND_CSV", "DocumentClassifierInputDataConfig": { "S3Uri": "s3://my-comprehend-datasets/multilabel_train.csv" } }

To add or remove tags on the dataset, use the TagResource and UntagResource operations.

Describe a dataset

Use the Amazon Comprehend DescribeDataset operation to retrieve configured information about a flywheel.

aws comprehend describe-dataset \ --dataset-arn "datasetARN"

The response contains the following content.

{ "DatasetProperties": { "DatasetArn": "arn:aws::comprehend:aws-region:111122223333:flywheel/myTestFlywheel/dataset/train-dataset", "DatasetName": "train-dataset", "DatasetType": "TRAIN", "DatasetS3Uri": "s3://my-test-datalake/flywheelbasictest/myTestFlywheel/schemaVersion=1/20220801T014326Z/datasets/train-dataset/20220801T194844Z", "Description": "Good Dataset", "Status": "COMPLETED", "NumberOfDocuments": 90, "CreationTime": 1659383324.297 } }
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.