

# Start batch evaluation
<a name="batch-evaluations-start"></a>

Start a batch evaluation to run evaluators against multiple agent sessions. The service discovers sessions from CloudWatch Logs, runs each evaluator against each session, and produces aggregate results.

## Code samples
<a name="start-batch-eval-examples"></a>

**Example**  
The CLI resolves `serviceNames` and `logGroupNames` automatically from the project configuration when you use `--runtime`:  

```
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate Builtin.Helpfulness Builtin.Faithfulness
```
With optional flags:  

```
# Custom name and lookback window
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate \
  --name "baseline-eval" \
  --lookback-days 1

# Specific sessions
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate \
  --session-ids session-abc123 session-def456

# With ground truth
agentcore run batch-evaluation \
  --runtime MyAgent \
  --evaluator Builtin.GoalSuccessRate Builtin.Correctness \
  --ground-truth ground-truth.json
```
The CLI polls until the job reaches a terminal state (`COMPLETED`, `FAILED`, or `STOPPED`), displays per-evaluator average scores, and saves results to `.cli/eval-job-results/`.

```
import boto3
import uuid
import time
import json

client = boto3.client("bedrock-agentcore", region_name="us-west-2")

# All sessions in the log group
response = client.start_batch_evaluation(
    batchEvaluationName=f"baseline_eval_{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
        {"evaluatorId": "Builtin.Helpfulness"},
        {"evaluatorId": "Builtin.Faithfulness"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
        }
    },
    clientToken=str(uuid.uuid4()),
)

batch_eval_id = response["batchEvaluationId"]
print(f"Started: {batch_eval_id}")

# Poll until complete
while True:
    result = client.get_batch_evaluation(batchEvaluationId=batch_eval_id)
    status = result["status"]
    print(f"Status: {status}")

    if status in ("COMPLETED", "COMPLETED_WITH_ERRORS", "FAILED", "STOPPED"):
        break
    time.sleep(30)

print(json.dumps(result, indent=4, default=str))
```
With session ID filtering:  

```
response = client.start_batch_evaluation(
    batchEvaluationName=f"targeted-eval-{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
            "filterConfig": {
                "sessionIds": ["session-001", "session-002", "session-003"]
            },
        }
    },
    clientToken=str(uuid.uuid4()),
)
```
With time range filtering:  

```
from datetime import datetime, timedelta, timezone

now = datetime.now(timezone.utc)
response = client.start_batch_evaluation(
    batchEvaluationName=f"weekly-eval-{uuid.uuid4().hex[:8]}",
    evaluators=[
        {"evaluatorId": "Builtin.GoalSuccessRate"},
    ],
    dataSourceConfig={
        "cloudWatchLogs": {
            "serviceNames": ["MyAgent.DEFAULT"],
            "logGroupNames": ["/aws/bedrock-agentcore/runtimes/MyAgent-abc123-DEFAULT"],
            "filterConfig": {
                "timeRange": {
                    "startTime": (now - timedelta(days=7)).isoformat(),
                    "endTime": now.isoformat(),
                }
            },
        }
    },
    clientToken=str(uuid.uuid4()),
)
```

## Request parameters
<a name="start-batch-eval-params"></a>


| Parameter | Type | Required | Description | 
| --- | --- | --- | --- | 
|   `batchEvaluationName`   |  String  |  Yes  |  A name for the batch evaluation job. Pattern: starts with a letter, alphanumeric and underscores, max 48 characters.  | 
|   `dataSourceConfig`   |  Object  |  Yes  |  Where to find agent sessions. Specify a `cloudWatchLogs` source with the log groups and service name for your agent. See [Session source](#start-batch-eval-session-source) below.  | 
|   `evaluators`   |  List  |  Yes  |  List of evaluators. Each entry has an `evaluatorId` field (for example, `Builtin.GoalSuccessRate`). Maximum 10 evaluators.  | 
|   `evaluationMetadata`   |  Object  |  No  |  Contains `sessionMetadata`, a list of per-session ground truth and metadata. Maximum 500 entries.  | 
|   `clientToken`   |  String  |  No  |  Idempotency token. If you retry a request with the same client token, the service returns the existing job instead of creating a new one.  | 

## Session source
<a name="start-batch-eval-session-source"></a>

The `dataSourceConfig` parameter specifies the CloudWatch Logs location where the service discovers agent sessions.

### Required fields
<a name="start-batch-eval-session-source-required"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|   `cloudWatchLogs.serviceNames`   |  List of strings (exactly 1)  |  The service name that identifies your agent’s traces in CloudWatch. Convention: `{RuntimeName}.DEFAULT`.  | 
|   `cloudWatchLogs.logGroupNames`   |  List of strings (1–5)  |  CloudWatch log group names where agent telemetry is stored. Convention: `/aws/bedrock-agentcore/runtimes/{agentId}-DEFAULT`.  | 

### Optional fields
<a name="start-batch-eval-session-source-optional"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|   `cloudWatchLogs.filterConfig.sessionIds`   |  List of strings  |  Evaluate only these specific session IDs. When omitted, the service discovers all sessions in the log group.  | 
|   `cloudWatchLogs.filterConfig.timeRange.startTime`   |  ISO 8601 datetime  |  Filter sessions created after this time.  | 
|   `cloudWatchLogs.filterConfig.timeRange.endTime`   |  ISO 8601 datetime  |  Filter sessions created before this time.  | 

## Response
<a name="start-batch-eval-response"></a>


| Field | Type | Description | 
| --- | --- | --- | 
|   `batchEvaluationId`   |  String  |  Unique identifier for the batch evaluation.  | 
|   `batchEvaluationArn`   |  String  |  ARN of the batch evaluation.  | 
|   `batchEvaluationName`   |  String  |  The name you specified.  | 
|   `status`   |  String  |  Initial status. One of: `PENDING`, `IN_PROGRESS`.  | 
|   `evaluators`   |  List  |  The evaluators used.  | 
|   `createdAt`   |  Timestamp  |  When the job was created.  | 
|   `outputConfig`   |  Object  |  CloudWatch Logs destination for per-session results.  | 

## Errors
<a name="start-batch-eval-errors"></a>


| Error | HTTP status | Description | 
| --- | --- | --- | 
|   `ValidationException`   |  400  |  Invalid request parameters. Check field constraints and required fields.  | 
|   `AccessDeniedException`   |  403  |  Insufficient permissions. Verify IAM policies.  | 
|   `ConflictException`   |  409  |  A batch evaluation with the same client token already exists with different parameters.  | 
|   `ThrottlingException`   |  429  |  Request rate exceeded. Retry with exponential backoff.  | 
|   `InternalServerException`   |  500  |  Service-side error. Retry the request.  | 