Dataset schema
A dataset contains one or more scenarios. Each scenario represents a conversation (session) with the agent. Both the on-demand and batch dataset runners use the same dataset format.
The AgentCore SDK supports two scenario types:
-
Predefined scenarios use a fixed sequence of turns that you author by hand. The runner replays the turns exactly as written.
-
Simulated scenarios use an LLM-backed actor to generate turns dynamically based on a persona and goal. See User simulation for details on actor profiles and simulation configuration.
FileDatasetProvider auto-detects the scenario type from the JSON structure: scenarios with a turns field are loaded as predefined; scenarios with an actor_profile field (and no turns) are loaded as simulated.
Predefined scenarios
A predefined scenario specifies a fixed sequence of turns with known inputs and optional expected outputs.
Single-turn example
Each scenario sends one prompt and checks the response:
{ "scenarios": [ { "scenario_id": "math-question", "turns": [ { "input": "What is 15 + 27?", "expected_response": "15 + 27 = 42" } ], "expected_trajectory": ["calculator"], "assertions": ["Agent used the calculator tool to compute the result"] }, { "scenario_id": "weather-check", "turns": [ { "input": "What's the weather?", "expected_response": "The weather is sunny" } ], "expected_trajectory": ["weather"], "assertions": ["Agent used the weather tool"] } ] }
Multi-turn example
Multi-turn scenarios have multiple turns per scenario. Turns execute sequentially within the same session, maintaining conversation context. Each turn can have its own expected_response, while assertions and expected_trajectory apply to the entire session:
{ "scenarios": [ { "scenario_id": "math-then-weather", "turns": [ { "input": "What is 15 + 27?", "expected_response": "15 + 27 = 42" }, { "input": "What's the weather?", "expected_response": "The weather is sunny" } ], "expected_trajectory": ["calculator", "weather"], "assertions": [ "Agent used the calculator tool for the math question", "Agent used the weather tool when asked about weather" ] } ] }
Scenario fields
| Field | Required | Description |
|---|---|---|
|
|
Yes |
Unique identifier for the scenario. |
|
|
Yes |
List of turns in the conversation. Each turn has |
|
|
No |
Expected sequence of tool names. Used by trajectory evaluators ( |
|
|
No |
Natural language assertions about expected behavior. Used by |
Turn fields
| Field | Required | Description |
|---|---|---|
|
|
Yes |
The prompt sent to the agent for this turn. Can be a string or a dict. |
|
|
No |
The expected agent response for this turn. Used by |
Simulated scenarios
A simulated scenario defines an actor profile and an initial input. The actor generates subsequent turns dynamically:
{ "scenarios": [ { "scenario_id": "geography-student", "scenario_description": "A curious student asks geography questions", "actor_profile": { "traits": {"expertise": "novice", "tone": "curious"}, "context": "A student studying world geography who wants to learn about capitals", "goal": "Find out the capital cities of at least two different countries" }, "input": "Hi! I'm studying geography. Can you help me learn about world capitals?", "max_turns": 5, "assertions": [ "Agent provides accurate capital city information", "Agent is helpful and encouraging to the student" ] } ] }
Scenario fields
| Field | Required | Description |
|---|---|---|
|
|
Yes |
Unique identifier for the scenario. |
|
|
Yes |
The actor’s identity and objective, containing |
|
|
Yes |
The first message sent to your agent to start the conversation. |
|
|
No |
Optional metadata describing the scenario. Useful for organizing and identifying scenarios in results. |
|
|
No |
Maximum number of turns before the conversation stops. Default: 10. |
|
|
No |
Natural language assertions about expected behavior. Used by |
Note
Simulated scenarios do not support expected_trajectory or per-turn expected_response because the conversation flow is not known in advance. Use assertions for ground truth with simulated scenarios.
Ground truth mapping
Both runners automatically map dataset fields to the evaluators that use them:
| Evaluator | Ground truth field | Level | Description |
|---|---|---|---|
|
|
|
Trace |
Measures how accurately the agent’s response matches the expected answer. |
|
|
|
Session |
Validates whether the agent’s behavior satisfies natural language assertions. |
|
|
|
Session |
Checks that the actual tool call sequence matches exactly. |
|
|
|
Session |
Checks that expected tools appear in order, allowing extras between them. |
|
|
|
Session |
Checks that all expected tools are present, regardless of order. |
-
Ground truth fields are optional. Evaluators that do not use ground truth (for example,
Builtin.Helpfulness,Builtin.Faithfulness) evaluate based on session content alone. -
You can include all ground truth fields in a single dataset. Each runner routes the relevant fields to the appropriate evaluators.
-
If no ground truth fields are present, evaluators fall back to their ground truth-free mode.
For more details on ground truth fields and how they work with the Evaluate API, see Ground truth evaluations.
Inline dataset construction
Instead of loading from a JSON file, you can construct datasets directly in Python:
from bedrock_agentcore.evaluation import Dataset, PredefinedScenario, Turn dataset = Dataset( scenarios=[ PredefinedScenario( scenario_id="math-question", turns=[ Turn( input="What is 15 + 27?", expected_response="15 + 27 = 42", ), ], expected_trajectory=["calculator"], assertions=["Agent used the calculator tool"], ), PredefinedScenario( scenario_id="weather-check", turns=[ Turn(input="What's the weather?"), ], expected_trajectory=["weather"], ), ] )
Or load from a JSON file:
from bedrock_agentcore.evaluation import FileDatasetProvider dataset = FileDatasetProvider("dataset.json").get_dataset()