interface EvaluatorInferenceConfig 🔹

Language	Type name
.NET	`Amazon.CDK.AWS.Bedrock.Agentcore.Alpha.EvaluatorInferenceConfig`
Go	`github.com/aws/aws-cdk-go/awsbedrockagentcorealpha/v2#EvaluatorInferenceConfig`
Java	`software.amazon.awscdk.services.bedrock.agentcore.alpha.EvaluatorInferenceConfig`
Python	`aws_cdk.aws_bedrock_agentcore_alpha.EvaluatorInferenceConfig`
TypeScript (source)	`@aws-cdk/aws-bedrock-agentcore-alpha` » `EvaluatorInferenceConfig`

Inference configuration for a custom LLM-as-a-Judge evaluator.

Controls how the foundation model generates evaluation responses.

Example

// LLM-as-a-Judge with categorical rating scale
const categoricalEvaluator = new agentcore.Evaluator(this, 'CategoricalEvaluator', {
  evaluatorName: 'domain_accuracy_evaluator',
  level: agentcore.EvaluationLevel.SESSION,
  description: 'Evaluates domain-specific accuracy of agent responses',
  evaluatorConfig: agentcore.EvaluatorConfig.llmAsAJudge({
    instructions: 'Evaluate whether the agent response is accurate within the healthcare domain.',
    modelId: 'us.anthropic.claude-sonnet-4-6',
    ratingScale: agentcore.EvaluatorRatingScale.categorical([
      { label: 'Accurate', definition: 'The response contains factually correct healthcare information.' },
      { label: 'Inaccurate', definition: 'The response contains incorrect or misleading healthcare information.' },
    ]),
  }),
});

// LLM-as-a-Judge with numerical rating scale and inference config
const numericalEvaluator = new agentcore.Evaluator(this, 'NumericalEvaluator', {
  evaluatorName: 'response_quality_evaluator',
  level: agentcore.EvaluationLevel.TRACE,
  evaluatorConfig: agentcore.EvaluatorConfig.llmAsAJudge({
    instructions: 'Rate the overall quality of the agent response on a scale of 1 to 5.',
    modelId: 'us.anthropic.claude-sonnet-4-6',
    ratingScale: agentcore.EvaluatorRatingScale.numerical([
      { label: 'Poor', definition: 'Inadequate response.', value: 1 },
      { label: 'Below Average', definition: 'Partially addresses the query.', value: 2 },
      { label: 'Average', definition: 'Adequately addresses the query.', value: 3 },
      { label: 'Good', definition: 'Well-structured and accurate response.', value: 4 },
      { label: 'Excellent', definition: 'Outstanding response exceeding expectations.', value: 5 },
    ]),
    inferenceConfig: {
      maxTokens: 1024,
      temperature: 0.1,
    },
  }),
});

Properties

Name	Type	Description
maxTokens?🔹	`number`	The maximum number of tokens to generate in the model response.
temperature?🔹	`number`	The temperature value that controls randomness in the model's responses.
topP?🔹	`number`	The top-p sampling parameter that controls the diversity of the model's responses.

maxTokens?🔹

Type: number (optional, default: The foundation model's default maximum token limit is used)

The maximum number of tokens to generate in the model response.

temperature?🔹

Type: number (optional, default: The foundation model's default temperature is used)

The temperature value that controls randomness in the model's responses.

Higher values produce more diverse outputs. Range: 0.0 to 1.0.

topP?🔹

Type: number (optional, default: The foundation model's default top-p value is used)

The top-p sampling parameter that controls the diversity of the model's responses.

Range: 0.0 to 1.0.

AWS CDK