EvaluatorConfig

class aws_cdk.aws_bedrock_agentcore_alpha.EvaluatorConfig(*args: Any, **kwargs)

Bases: object

(experimental) Configuration for a custom evaluator.

Defines how an evaluator assesses agent performance. Supports two strategies:

LLM-as-a-Judge: Uses a foundation model with custom instructions and a rating scale.
Code-based: Uses a Lambda function for custom evaluation logic.

Stability:: experimental

Example:

# Code-based evaluator
# my_eval_function: lambda.IFunction
# LLM-as-a-Judge evaluator
llm_config = agentcore.EvaluatorConfig.llm_as_aJudge(
    instructions="Evaluate whether the agent response is helpful.",
    model_id="us.anthropic.claude-sonnet-4-6",
    rating_scale=agentcore.EvaluatorRatingScale.categorical([label="Good", definition="The response is helpful.", label="Bad", definition="The response is not helpful."
    ])
)
code_config = agentcore.EvaluatorConfig.code_based(
    lambda_function=my_eval_function
)

Attributes

lambda_function

(experimental) The Lambda function used for code-based evaluation, if applicable.

Stability:: experimental

Static Methods

classmethod code_based(*, lambda_function, timeout=None)

(experimental) Creates a code-based evaluator configuration using a Lambda function.

The Lambda function implements custom evaluation logic. The function will automatically be granted invoke permissions for the bedrock-agentcore service.

Parameters:

lambda_function (IFunction) – (experimental) The Lambda function used for evaluation. The function will be granted invoke permissions for the bedrock-agentcore.amazonaws.com service principal, scoped to this specific evaluator resource.
timeout (Optional[Duration]) – (experimental) The timeout for the Lambda function invocation during evaluation. When not specified, the AgentCore evaluation service uses its default timeout for Lambda-based evaluators. Default: - The AgentCore evaluation service’s default Lambda timeout is used

Stability:

experimental

Return type:

EvaluatorConfig

classmethod llm_as_a_judge(*, instructions, model_id, rating_scale, additional_model_request_fields=None, inference_config=None)

(experimental) Creates an LLM-as-a-Judge evaluator configuration.

Uses a foundation model to assess agent performance based on custom instructions and a rating scale.

Parameters:

instructions (str) – (experimental) The evaluation instructions that guide the language model in assessing agent performance. These instructions define the evaluation criteria, context, and expected behavior. Instructions must contain placeholders appropriate for the evaluation level (e.g., {context}, {available_tools} for SESSION level). Note: Evaluators using reference-input placeholders (e.g., {expected_tool_trajectory}, {assertions}, {expected_response}) are only compatible with on-demand evaluation, not online evaluation.
model_id (str) – (experimental) The identifier of the Amazon Bedrock model to use for evaluation. Accepts standard model IDs (e.g., 'anthropic.claude-sonnet-4-6') and cross-region inference profile IDs with region prefixes (e.g., 'us.anthropic.claude-sonnet-4-6', 'eu.anthropic.claude-sonnet-4-6').
rating_scale (EvaluatorRatingScale) – (experimental) The rating scale that defines how the evaluator should score agent performance.
additional_model_request_fields (Optional[Mapping[str, Any]]) – (experimental) Additional model-specific request fields. Default: - No additional fields
inference_config (Union[EvaluatorInferenceConfig, Dict[str, Any], None]) – (experimental) Optional inference configuration parameters that control model behavior during evaluation. When not specified, the foundation model uses its own default values for maxTokens, temperature, and topP. Default: - The foundation model’s default inference parameters are used

Stability:

experimental

Return type:

EvaluatorConfig