Create dataset Get dataset List datasets Update dataset Delete dataset

Manage datasets

This topic covers creating, retrieving, listing, updating, and deleting datasets.

Create dataset

The CreateDataset API creates a new evaluation dataset. This is an asynchronous operation (HTTP 202) — the dataset transitions from CREATING to ACTIVE once ingestion completes.

Required parameters: datasetName (alphanumeric and underscores only, ^[a-zA-Z][a-zA-Z0-9_]{0,47}$), schemaType, and source (inline examples or S3 URI).

Optional parameters: description, kmsKeyArn (customer managed encryption key, immutable after creation — see Dataset encryption), tags.

The following examples show how to create a dataset:

Example

AgentCore CLI


# Add a dataset to your project
agentcore add dataset --name my_eval_dataset \
    --schema-type AGENTCORE_EVALUATION_PREDEFINED_V1

# Edit the generated JSONL file with your scenarios
# File location: agentcore/datasets/my_eval_dataset.jsonl

# Deploy to create the dataset in your AWS account
agentcore deploy

This creates a local JSONL file and registers the dataset in your project configuration. Run agentcore deploy to create the dataset resource and sync examples to the service.

Note

Run this from inside an AgentCore project directory (created with agentcore create).

AgentCore SDK


from bedrock_agentcore.evaluation import DatasetClient

client = DatasetClient(region_name="us-west-2")

# Create with inline examples (polls until ACTIVE)
ds = client.create_dataset_and_wait(
    datasetName="customer_support_scenarios",
    schemaType="AGENTCORE_EVALUATION_PREDEFINED_V1",
    source={
        "inlineExamples": {
            "examples": [
                {
                    "scenario_id": "TC-01",
                    "turns": [{"input": "What is my balance?", "expected_response": "Your balance is $50."}],
                    "assertions": ["Response includes a dollar amount"],
                }
            ]
        }
    },
)
print(f"Dataset ID: {ds['datasetId']}, Status: {ds['status']}")

# Create with S3 source
ds = client.create_dataset_and_wait(
    datasetName="my_s3_dataset",
    schemaType="AGENTCORE_EVALUATION_PREDEFINED_V1",
    source={"s3Source": {"s3Uri": "s3://my-bucket/scenarios.jsonl"}},
)

Note

For S3 ingestion, each line in the JSONL file must include an exampleId field. The S3 bucket must be accessible using the caller’s credentials.

AWS SDK


import boto3
import time

client = boto3.client('bedrock-agentcore-control')

response = client.create_dataset(
    datasetName='customer_support_scenarios',
    schemaType='AGENTCORE_EVALUATION_PREDEFINED_V1',
    source={
        'inlineExamples': {
            'examples': [
                {
                    'scenario_id': 'TC-01',
                    'turns': [{'input': 'What is my balance?', 'expected_response': 'Your balance is $50.'}],
                    'assertions': ['Response includes a dollar amount'],
                }
            ]
        }
    }
)
dataset_id = response['datasetId']

# Create with S3 source
response = client.create_dataset(
    datasetName='my_s3_dataset',
    schemaType='AGENTCORE_EVALUATION_PREDEFINED_V1',
    source={
        's3Source': {'s3Uri': 's3://my-bucket/scenarios.jsonl'}
    }
)

# Poll until ACTIVE
while True:
    ds = client.get_dataset(datasetId=dataset_id)
    if ds['status'] in ('ACTIVE', 'CREATE_FAILED'):
        break
    time.sleep(2)

AWS CLI


# Create with inline examples
aws bedrock-agentcore-control create-dataset \
    --dataset-name "customer_support_scenarios" \
    --schema-type AGENTCORE_EVALUATION_PREDEFINED_V1 \
    --source '{"inlineExamples": {"examples": [{"scenario_id": "TC-01", "turns": [{"input": "What is my balance?", "expected_response": "Your balance is $50."}], "assertions": ["Response includes a dollar amount"]}]}}'

# Create with S3 source
aws bedrock-agentcore-control create-dataset \
    --dataset-name "my_s3_dataset" \
    --schema-type AGENTCORE_EVALUATION_PREDEFINED_V1 \
    --source '{"s3Source": {"s3Uri": "s3://my-bucket/scenarios.jsonl"}}'

# Poll until ACTIVE
aws bedrock-agentcore-control get-dataset \
    --dataset-id my-dataset-id

Get dataset

The GetDataset API retrieves dataset metadata, status, example count, and a presigned download URL for the dataset content. By default reads the Draft; specify datasetVersion for a published version.

The downloadUrl is a presigned S3 URL for the full dataset.jsonl file. You can download it with a plain HTTP GET request without authentication headers.

The following examples show how to get a dataset:

Example

AgentCore CLI


# Show dataset deployment status and metadata
agentcore status --type dataset

# Download dataset content to your local JSONL file (default: Draft)
agentcore dataset download --name my_eval_dataset

# Download a specific published version
agentcore dataset download --name my_eval_dataset --version 1

AgentCore SDK


from bedrock_agentcore.evaluation import DatasetClient

client = DatasetClient(region_name="us-west-2")

# Get dataset (default: Draft)
ds = client.get_dataset(datasetId="my-dataset-id")
print(f"Status: {ds['status']}, Examples: {ds['exampleCount']}")
print(f"Download URL: {ds['downloadUrl']}")

# Get a specific published version
ds_v1 = client.get_dataset(datasetId="my-dataset-id", datasetVersion="1")

AWS SDK


import boto3

client = boto3.client('bedrock-agentcore-control')

response = client.get_dataset(datasetId='my-dataset-id')
print(f"Status: {response['status']}, Examples: {response['exampleCount']}")

# Download the dataset content via presigned URL
if 'downloadUrl' in response:
    import requests
    data = requests.get(response['downloadUrl'])
    print(data.text)

# Get a specific published version
response = client.get_dataset(datasetId='my-dataset-id', datasetVersion='1')

AWS CLI


# Get dataset (default: Draft)
aws bedrock-agentcore-control get-dataset \
    --dataset-id my-dataset-id

# Get a specific published version
aws bedrock-agentcore-control get-dataset \
    --dataset-id my-dataset-id \
    --dataset-version 1

List datasets

The ListDatasets API returns a paginated list of datasets in your account and Region.

The following examples show how to list datasets:

Example

Update dataset

The UpdateDataset API updates dataset metadata. This is a synchronous operation (HTTP 200). Only description and tags can be updated. datasetName, schemaType, and kmsKeyArn are immutable after creation.

The dataset must be in ACTIVE, UPDATE_FAILED, or CREATE_FAILED status.

The following examples show how to update dataset metadata:

Example

Delete dataset

The DeleteDataset API deletes a dataset. This is an asynchronous operation (HTTP 202).

Full delete (omit datasetVersion): Deletes all versions, Draft, and the dataset record.
Version-specific delete (specify datasetVersion as an integer): Deletes only that published version.

The dataset must be in ACTIVE, CREATE_FAILED, UPDATE_FAILED, or DELETE_FAILED status.

Note

Only integer version numbers are accepted for version-specific deletion.

The following examples show how to delete a dataset:

Example

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Dataset schema

Manage examples