

# Dataset schema
<a name="dataset-evaluations-schema"></a>

A dataset contains one or more scenarios. Each scenario represents a conversation (session) with the agent. Both the [on-demand](dataset-evaluations-on-demand.md) and [batch](dataset-evaluations-batch.md) dataset runners use the same dataset format.

The AgentCore SDK supports two scenario types:
+  **Predefined scenarios** use a fixed sequence of turns that you author by hand. The runner replays the turns exactly as written.
+  **Simulated scenarios** use an LLM-backed actor to generate turns dynamically based on a persona and goal. See [User simulation](user-simulation.md) for details on actor profiles and simulation configuration.

 `FileDatasetProvider` auto-detects the scenario type from the JSON structure: scenarios with a `turns` field are loaded as predefined; scenarios with an `actor_profile` field (and no `turns`) are loaded as simulated.

## Predefined scenarios
<a name="dataset-schema-predefined"></a>

A predefined scenario specifies a fixed sequence of turns with known inputs and optional expected outputs.

### Single-turn example
<a name="dataset-schema-predefined-single-turn"></a>

Each scenario sends one prompt and checks the response:

```
{
  "scenarios": [
    {
      "scenario_id": "math-question",
      "turns": [
        {
          "input": "What is 15 + 27?",
          "expected_response": "15 + 27 = 42"
        }
      ],
      "expected_trajectory": ["calculator"],
      "assertions": ["Agent used the calculator tool to compute the result"]
    },
    {
      "scenario_id": "weather-check",
      "turns": [
        {
          "input": "What's the weather?",
          "expected_response": "The weather is sunny"
        }
      ],
      "expected_trajectory": ["weather"],
      "assertions": ["Agent used the weather tool"]
    }
  ]
}
```

### Multi-turn example
<a name="dataset-schema-predefined-multi-turn"></a>

Multi-turn scenarios have multiple turns per scenario. Turns execute sequentially within the same session, maintaining conversation context. Each turn can have its own `expected_response`, while `assertions` and `expected_trajectory` apply to the entire session:

```
{
  "scenarios": [
    {
      "scenario_id": "math-then-weather",
      "turns": [
        {
          "input": "What is 15 + 27?",
          "expected_response": "15 + 27 = 42"
        },
        {
          "input": "What's the weather?",
          "expected_response": "The weather is sunny"
        }
      ],
      "expected_trajectory": ["calculator", "weather"],
      "assertions": [
        "Agent used the calculator tool for the math question",
        "Agent used the weather tool when asked about weather"
      ]
    }
  ]
}
```

### Scenario fields
<a name="dataset-schema-predefined-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|   `scenario_id`   |  Yes  |  Unique identifier for the scenario.  | 
|   `turns`   |  Yes  |  List of turns in the conversation. Each turn has `input` (required) and `expected_response` (optional).  | 
|   `expected_trajectory`   |  No  |  Expected sequence of tool names. Used by trajectory evaluators (`Builtin.TrajectoryExactOrderMatch`, `Builtin.TrajectoryInOrderMatch`, `Builtin.TrajectoryAnyOrderMatch`).  | 
|   `assertions`   |  No  |  Natural language assertions about expected behavior. Used by `Builtin.GoalSuccessRate`.  | 

### Turn fields
<a name="dataset-schema-turn-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|   `input`   |  Yes  |  The prompt sent to the agent for this turn. Can be a string or a dict.  | 
|   `expected_response`   |  No  |  The expected agent response for this turn. Used by `Builtin.Correctness`. Mapped positionally to the trace produced by this turn; turn 0 maps to trace 0, turn 1 maps to trace 1.  | 

## Simulated scenarios
<a name="dataset-schema-simulated"></a>

A simulated scenario defines an actor profile and an initial input. The actor generates subsequent turns dynamically:

```
{
  "scenarios": [
    {
      "scenario_id": "geography-student",
      "scenario_description": "A curious student asks geography questions",
      "actor_profile": {
        "traits": {"expertise": "novice", "tone": "curious"},
        "context": "A student studying world geography who wants to learn about capitals",
        "goal": "Find out the capital cities of at least two different countries"
      },
      "input": "Hi! I'm studying geography. Can you help me learn about world capitals?",
      "max_turns": 5,
      "assertions": [
        "Agent provides accurate capital city information",
        "Agent is helpful and encouraging to the student"
      ]
    }
  ]
}
```

### Scenario fields
<a name="dataset-schema-simulated-fields"></a>


| Field | Required | Description | 
| --- | --- | --- | 
|   `scenario_id`   |  Yes  |  Unique identifier for the scenario.  | 
|   `actor_profile`   |  Yes  |  The actor’s identity and objective, containing `context` (required), `goal` (required), and `traits` (optional). See [User simulation](user-simulation.md).  | 
|   `input`   |  Yes  |  The first message sent to your agent to start the conversation.  | 
|   `scenario_description`   |  No  |  Optional metadata describing the scenario. Useful for organizing and identifying scenarios in results.  | 
|   `max_turns`   |  No  |  Maximum number of turns before the conversation stops. Default: 10.  | 
|   `assertions`   |  No  |  Natural language assertions about expected behavior. Used by `Builtin.GoalSuccessRate`.  | 

**Note**  
Simulated scenarios do not support `expected_trajectory` or per-turn `expected_response` because the conversation flow is not known in advance. Use `assertions` for ground truth with simulated scenarios.

## Ground truth mapping
<a name="dataset-schema-ground-truth"></a>

Both runners automatically map dataset fields to the evaluators that use them:


| Evaluator | Ground truth field | Level | Description | 
| --- | --- | --- | --- | 
|   `Builtin.Correctness`   |   `turns[].expected_response`   |  Trace  |  Measures how accurately the agent’s response matches the expected answer.  | 
|   `Builtin.GoalSuccessRate`   |   `assertions`   |  Session  |  Validates whether the agent’s behavior satisfies natural language assertions.  | 
|   `Builtin.TrajectoryExactOrderMatch`   |   `expected_trajectory`   |  Session  |  Checks that the actual tool call sequence matches exactly.  | 
|   `Builtin.TrajectoryInOrderMatch`   |   `expected_trajectory`   |  Session  |  Checks that expected tools appear in order, allowing extras between them.  | 
|   `Builtin.TrajectoryAnyOrderMatch`   |   `expected_trajectory`   |  Session  |  Checks that all expected tools are present, regardless of order.  | 
+ Ground truth fields are optional. Evaluators that do not use ground truth (for example, `Builtin.Helpfulness`, `Builtin.Faithfulness`) evaluate based on session content alone.
+ You can include all ground truth fields in a single dataset. Each runner routes the relevant fields to the appropriate evaluators.
+ If no ground truth fields are present, evaluators fall back to their ground truth-free mode.

For more details on ground truth fields and how they work with the Evaluate API, see [Ground truth evaluations](ground-truth-evaluations.md).

## Inline dataset construction
<a name="dataset-schema-inline-construction"></a>

Instead of loading from a JSON file, you can construct datasets directly in Python:

```
from bedrock_agentcore.evaluation import Dataset, PredefinedScenario, Turn

dataset = Dataset(
    scenarios=[
        PredefinedScenario(
            scenario_id="math-question",
            turns=[
                Turn(
                    input="What is 15 + 27?",
                    expected_response="15 + 27 = 42",
                ),
            ],
            expected_trajectory=["calculator"],
            assertions=["Agent used the calculator tool"],
        ),
        PredefinedScenario(
            scenario_id="weather-check",
            turns=[
                Turn(input="What's the weather?"),
            ],
            expected_trajectory=["weather"],
        ),
    ]
)
```

Or load from a JSON file:

```
from bedrock_agentcore.evaluation import FileDatasetProvider

dataset = FileDatasetProvider("dataset.json").get_dataset()
```