Creating your first model evaluation job that uses a LLM as judge in Amazon Bedrock

To create a model evaluation job that uses a LLM as judge, you need access to specific service level resources, and Amazon Bedrock foundational models. Use the linked topics to learn more about getting setting up.

Required service level resources to start a model evaluation job that uses a judge model

You need access to at least one of the following Amazon Bedrock foundation models. These are the available judge models. To learn more about gaining access to models and region availability, see Access Amazon Bedrock foundation models.
- Mistral Large – mistral.mistral-large-2402-v1:0
- Anthropic Claude 3.5 Sonnet – anthropic.claude-3-5-sonnet-20240620-v1:0
- Anthropic Claude 3 Haiku – anthropic.claude-3-haiku-20240307-v1:0:
- Meta Llama 3.1 70B Instruct – meta.llama3-1-70b-instruct-v1:0
Create a prompt dataset. Your prompt dataset is a json lines (jsonl) formatted dataset that contains the prompts and required ground truth data for the model evaluation job to run successfully. For more information, see Requirements for custom prompt datasets in model evaluation job that uses a model as judge.
To create a model evaluation job that uses a LLM judge you need access to the https://console.aws.amazon.com/bedrock/, AWS Command Line Interface, or a supported AWS SDK. To learn more about the required IAM actions and resources, see Required console permissions to create an model evaluation job that uses a model as judge in Amazon Bedrock.
When the model evaluation job starts, a service role is used to perform actions on your behalf. To learn more about required IAM actions and trust policy requirements, see Required service role permissions for creating a model evaluation job that uses a judge model.
Amazon Simple Storage Service – Any prompt dataset specified in a model evaluation job must be placed in a Amazon S3 bucket. Model evaluation job created using the Amazon Bedrock console require that you specify the correct CORS permissions on the bucket. For more information about the required CORS policy permissions, see Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets.

Required console permissions to create an model evaluation job that uses a model as judge in Amazon Bedrock

The following policy contains the minimum set of IAM actions and resources in Amazon Bedrock and Amazon S3 that are required to create an automatic model evaluation job using the Amazon Bedrock console.

In the policy, we recommend using the IAM JSON policy element Resource to limit access to only the models and buckets required for the IAM user, group, or role.

The IAM policy must access to both generator and evaluator models.


{
	"Version": "2012-10-17",
	"Statement": [
	    {
	        "Sid": "BedrockConsole",
	        "Effect": "Allow",
	        "Action": [
				"bedrock:CreateEvaluationJob",
				"bedrock:GetEvaluationJob",
				"bedrock:ListEvaluationJobs",
				"bedrock:StopEvaluationJob",
				"bedrock:GetCustomModel",
				"bedrock:ListCustomModels",
				"bedrock:CreateProvisionedModelThroughput",
				"bedrock:UpdateProvisionedModelThroughput",
				"bedrock:GetProvisionedModelThroughput",
				"bedrock:ListProvisionedModelThroughputs",
				"bedrock:GetImportedModel",
				"bedrock:ListImportedModels",
				"bedrock:ListTagsForResource",
				"bedrock:UntagResource",
				"bedrock:TagResource"
	        ],
	        "Resource": [
				"arn:aws:bedrock:us-west-2::foundation-model/model-id-of-foundational-model",
				"arn:aws:bedrock:us-west-2::foundation-model/model-id-of-foundational-model",
			]
	    },
	    {
	        "Sid": "AllowConsoleS3AccessForModelEvaluation",
	        "Effect": "Allow",
	        "Action": [
	          "s3:GetObject",
	          "s3:GetBucketCORS",
	          "s3:ListBucket",
	          "s3:ListBucketVersions",
	          "s3:GetBucketLocation"
	        ],
	        "Resource": [
				"arn:aws:s3:::my_output_bucket",
				"arn:aws:s3:::input_datasets/prompts.jsonl",

			]
	    }
	]
}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

LLM as Judge

Prompt datasets