BedrockAgentCoreControl / Client / create_evaluator

create_evaluator¶

BedrockAgentCoreControl.Client.create_evaluator(**kwargs)¶

Creates a custom evaluator for agent quality assessment. Custom evaluators can use either LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings, or code-based configurations with customer-managed Lambda functions to evaluate agent performance at tool call, trace, or session levels.

Request Syntax

response = client.create_evaluator(
    clientToken='string',
    evaluatorName='string',
    description='string',
    evaluatorConfig={
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                },
                'responsesEvaluatorModelConfig': {
                    'modelId': 'string',
                    'maxOutputTokens': 123,
                    'temperature': ...,
                    'topP': ...,
                    'reasoning': {
                        'effort': 'string'
                    }
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    level='TOOL_CALL'|'TRACE'|'SESSION',
    kmsKeyArn='string',
    tags={
        'string': 'string'
    }
)

Parameters:

clientToken (string) –
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don’t specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn’t return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.
evaluatorName (string) –
[REQUIRED]

The name of the evaluator. Must be unique within your account.
description (string) – The description of the evaluator that explains its purpose and evaluation criteria.
evaluatorConfig (dict) –
[REQUIRED]

The configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.

Note
This is a Tagged Union structure. Only one of the following top level keys can be set: llmAsAJudge, codeBased.
- llmAsAJudge (dict) –
  
  The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
  - instructions (string) – [REQUIRED]
    
    The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
  - ratingScale (dict) – [REQUIRED]
    
    The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
    
    Note
    This is a Tagged Union structure. Only one of the following top level keys can be set: numerical, categorical.
    - numerical (list) –
      
      The numerical rating scale with defined score values and descriptions for quantitative evaluation.
      - (dict) –
        
        The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
        
        definition (string) – [REQUIRED]
        
        The description that explains what this numerical rating represents and when it should be used.
        
        value (float) – [REQUIRED]
        
        The numerical value for this rating scale option.
        
        label (string) – [REQUIRED]
        
        The label or name that describes this numerical rating option.
    - categorical (list) –
      
      The categorical rating scale with named categories and definitions for qualitative evaluation.
      - (dict) –
        
        The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
        
        definition (string) – [REQUIRED]
        
        The description that explains what this categorical rating represents and when it should be used.
        
        label (string) – [REQUIRED]
        
        The label or name of this categorical rating option.
  - modelConfig (dict) – [REQUIRED]
    
    The model configuration that specifies which foundation model to use and how to configure it for evaluation.
    
    Note
    This is a Tagged Union structure. Only one of the following top level keys can be set: bedrockEvaluatorModelConfig, responsesEvaluatorModelConfig.
    - bedrockEvaluatorModelConfig (dict) –
      
      The Amazon Bedrock model configuration for evaluation.
      - modelId (string) – [REQUIRED]
        
        The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
      - inferenceConfig (dict) –
        
        The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
        
        maxTokens (integer) –
        
        The maximum number of tokens to generate in the model response during evaluation.
        
        temperature (float) –
        
        The temperature value that controls randomness in the model’s responses. Lower values produce more deterministic outputs.
        
        topP (float) –
        
        The top-p sampling parameter that controls the diversity of the model’s responses by limiting the cumulative probability of token choices.
        
        stopSequences (list) –
        
        The list of sequences that will cause the model to stop generating tokens when encountered.
        
        (string) –
      - additionalModelRequestFields (document) –
        
        Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
    - responsesEvaluatorModelConfig (dict) –
      
      The OpenResponses model configuration for evaluation.
      - modelId (string) – [REQUIRED]
        
        The identifier of the model to use for evaluation.
      - maxOutputTokens (integer) –
        
        The maximum number of tokens to generate in the model response, including visible output and reasoning tokens.
      - temperature (float) –
        
        The temperature value that controls randomness in the model’s responses. Lower values produce more deterministic outputs.
      - topP (float) –
        
        The top-p sampling parameter that controls the diversity of the model’s responses by limiting the cumulative probability of token choices.
      - reasoning (dict) –
        
        The reasoning configuration for reasoning models. Non-reasoning models ignore this configuration.
        
        effort (string) –
        
        The level of reasoning effort the model applies when generating a response. For supported values, see the model provider’s documentation.
- codeBased (dict) –
  
  Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
  
  Note
  This is a Tagged Union structure. Only one of the following top level keys can be set: lambdaConfig.
  - lambdaConfig (dict) –
    
    The Lambda function configuration for code-based evaluation.
    - lambdaArn (string) – [REQUIRED]
      
      The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
    - lambdaTimeoutInSeconds (integer) –
      
      The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
level (string) –
[REQUIRED]

The evaluation level that determines the scope of evaluation. Valid values are TOOL_CALL for individual tool invocations, TRACE for single request-response interactions, or SESSION for entire conversation sessions.
kmsKeyArn (string) – The Amazon Resource Name (ARN) of a customer managed KMS key to use for encrypting sensitive evaluator data, including instructions and rating scale. If you don’t specify a KMS key, the evaluator data is encrypted with an Amazon Web Services owned key. Only symmetric encryption KMS keys are supported. For more information, see Encryption at rest for AgentCore Evaluations.
tags (dict) –
A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.
- (string) –
  - (string) –

Return type:

dict

Returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'createdAt': datetime(2015, 1, 1),
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}

Response Structure

(dict) –
- evaluatorArn (string) –
  
  The Amazon Resource Name (ARN) of the created evaluator.
- evaluatorId (string) –
  
  The unique identifier of the created evaluator.
- createdAt (datetime) –
  
  The timestamp when the evaluator was created.
- status (string) –
  
  The status of the evaluator creation operation.

Exceptions

BedrockAgentCoreControl.Client.exceptions.ServiceQuotaExceededException
BedrockAgentCoreControl.Client.exceptions.ValidationException
BedrockAgentCoreControl.Client.exceptions.AccessDeniedException
BedrockAgentCoreControl.Client.exceptions.ConflictException
BedrockAgentCoreControl.Client.exceptions.ThrottlingException
BedrockAgentCoreControl.Client.exceptions.InternalServerException