BedrockAgentCoreControl / Client / create_evaluator
create_evaluator¶
- BedrockAgentCoreControl.Client.create_evaluator(**kwargs)¶
Creates a custom evaluator for agent quality assessment. Custom evaluators can use either LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings, or code-based configurations with customer-managed Lambda functions to evaluate agent performance at tool call, trace, or session levels.
See also: AWS API Documentation
Request Syntax
response = client.create_evaluator( clientToken='string', evaluatorName='string', description='string', evaluatorConfig={ 'llmAsAJudge': { 'instructions': 'string', 'ratingScale': { 'numerical': [ { 'definition': 'string', 'value': 123.0, 'label': 'string' }, ], 'categorical': [ { 'definition': 'string', 'label': 'string' }, ] }, 'modelConfig': { 'bedrockEvaluatorModelConfig': { 'modelId': 'string', 'inferenceConfig': { 'maxTokens': 123, 'temperature': ..., 'topP': ..., 'stopSequences': [ 'string', ] }, 'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None } } }, 'codeBased': { 'lambdaConfig': { 'lambdaArn': 'string', 'lambdaTimeoutInSeconds': 123 } } }, level='TOOL_CALL'|'TRACE'|'SESSION', kmsKeyArn='string', tags={ 'string': 'string' } )
- Parameters:
clientToken (string) –
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don’t specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn’t return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
evaluatorName (string) –
[REQUIRED]
The name of the evaluator. Must be unique within your account.
description (string) – The description of the evaluator that explains its purpose and evaluation criteria.
evaluatorConfig (dict) –
[REQUIRED]
The configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
llmAsAJudge,codeBased.llmAsAJudge (dict) –
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) – [REQUIRED]
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) – [REQUIRED]
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
numerical,categorical.numerical (list) –
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) –
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this numerical rating represents and when it should be used.
value (float) – [REQUIRED]
The numerical value for this rating scale option.
label (string) – [REQUIRED]
The label or name that describes this numerical rating option.
categorical (list) –
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) –
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this categorical rating represents and when it should be used.
label (string) – [REQUIRED]
The label or name of this categorical rating option.
modelConfig (dict) – [REQUIRED]
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
bedrockEvaluatorModelConfig.bedrockEvaluatorModelConfig (dict) –
The Amazon Bedrock model configuration for evaluation.
modelId (string) – [REQUIRED]
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) –
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) –
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) –
The temperature value that controls randomness in the model’s responses. Lower values produce more deterministic outputs.
topP (float) –
The top-p sampling parameter that controls the diversity of the model’s responses by limiting the cumulative probability of token choices.
stopSequences (list) –
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) –
additionalModelRequestFields (document) –
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) –
Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
lambdaConfig.lambdaConfig (dict) –
The Lambda function configuration for code-based evaluation.
lambdaArn (string) – [REQUIRED]
The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
lambdaTimeoutInSeconds (integer) –
The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
level (string) –
[REQUIRED]
The evaluation level that determines the scope of evaluation. Valid values are
TOOL_CALLfor individual tool invocations,TRACEfor single request-response interactions, orSESSIONfor entire conversation sessions.kmsKeyArn (string) – The Amazon Resource Name (ARN) of a customer managed KMS key to use for encrypting sensitive evaluator data, including instructions and rating scale. If you don’t specify a KMS key, the evaluator data is encrypted with an Amazon Web Services owned key. Only symmetric encryption KMS keys are supported. For more information, see Encryption at rest for AgentCore Evaluations.
tags (dict) –
A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.
(string) –
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'evaluatorArn': 'string', 'evaluatorId': 'string', 'createdAt': datetime(2015, 1, 1), 'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING' }
Response Structure
(dict) –
evaluatorArn (string) –
The Amazon Resource Name (ARN) of the created evaluator.
evaluatorId (string) –
The unique identifier of the created evaluator.
createdAt (datetime) –
The timestamp when the evaluator was created.
status (string) –
The status of the evaluator creation operation.
Exceptions
BedrockAgentCoreControl.Client.exceptions.ServiceQuotaExceededExceptionBedrockAgentCoreControl.Client.exceptions.ValidationExceptionBedrockAgentCoreControl.Client.exceptions.AccessDeniedExceptionBedrockAgentCoreControl.Client.exceptions.ConflictExceptionBedrockAgentCoreControl.Client.exceptions.ThrottlingExceptionBedrockAgentCoreControl.Client.exceptions.InternalServerException