Starting an automatic model evaluation job in Amazon Bedrock

You can create an automatic model evaluation job using the AWS Management Console, AWS CLI, or a supported AWS SDK. In an automatic model evaluation job, the model you select performs inference using either prompts from a supported built-in dataset or your own custom prompt dataset. Each job also requires you to select a task type. The task type provides you with some recommended metrics, and built-in prompt datasets. To learn more about available task types and metrics, see Model evaluation task types in Amazon Bedrock.

The following examples show you how to create an automatic model evaluation job using the Amazon Bedrock console, AWS CLI, SDK for Python.

All automatic model evaluation jobs require that you create an IAM service role. To learn more about the IAM requirements for setting up a model evaluation job, see Service role requirements for model evaluation jobs.

The following examples show you how to create an automatic model evaluation job. In the API, you can also include an inference profile in the job by specifying its ARN in the modelIdentifier field.

Amazon Bedrock console

Use the following procedure to create a model evaluation job using the Amazon Bedrock console. To successfully complete this procedure make sure that your IAM user, group, or role has the sufficient permissions to access the console. To learn more, see Required console permissions to create an automatic model evaluation job.

Also, any custom prompt datasets that you want to specify in the model evaluation job must have the required CORS permissions added to the Amazon S3 bucket. To learn more about adding the required CORS permissions see, Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets.

To create a automatic model evaluation job

Open the Amazon Bedrock console: https://console.aws.amazon.com/bedrock/
In the navigation pane, choose Model evaluation.
In the Build an evaluation card, under Automatic choose Create automatic evaluation.
On the Create automatic evaluation page, provide the following information
1. Evaluation name — Give the model evaluation job a name that describes the job. This name is shown in your model evaluation job list. The name must be unique in your account in an AWS Region.
2. Description (Optional) — Provide an optional description.
3. Models — Choose the model you want to use in the model evaluation job.
  
  To learn more about available models and accessing them in Amazon Bedrock, see Access Amazon Bedrock foundation models.
4. (Optional) To change the inference configuration choose update.
  
  Changing the inference configuration changes the responses generated by the selected models. To learn more about the available inferences parameters, see Inference request parameters and response fields for foundation models.
5. Task type — Choose the type of task you want the model to attempt to perform during the model evaluation job.
6. Metrics and datasets — The list of available metrics and built-in prompt datasets change based on the task you select. You can choose from the list of Available built-in datasets or you can choose Use your own prompt dataset. If you choose to use your own prompt dataset, enter the exact S3 URI of your prompt dataset file or choose Browse S3 to search for your prompt data set.
7. >Evaluation results —Specify the S3 URI of the directory where you want the results saved. Choose Browse S3 to search for a location in Amazon S3.
8. (Optional) To enable the use of a customer managed key Choose Customize encryption settings (advanced). Then, provide the ARN of the AWS KMS key you want to use.
9. Amazon Bedrock IAM role — Choose Use an existing role to use IAM service role that already has the required permissions, or choose Create a new role to create a new IAM service role.
Then, choose Create.

Once the status changes Completed, then you can view the job's report card.

SDK for Python

The following example creates an automatic evaluation job using Python.


import boto3
client = boto3.client('bedrock')

job_request = client.create_evaluation_job(
    jobName="api-auto-job-titan",
    jobDescription="two different task types",
    roleArn="arn:aws:iam::111122223333:role/role-name",
    inferenceConfig={
        "models": [
            {
                "bedrockModel": {
                    "modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1",
                    "inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"
                }

            }
        ]

    },
    outputDataConfig={
        "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/"
    },
    evaluationConfig={
        "automated": {
            "datasetMetricConfigs": [
                {
                    "taskType": "QuestionAndAnswer",
                    "dataset": {
                        "name": "Builtin.BoolQ"
                    },
                    "metricNames": [
                        "Builtin.Accuracy",
                        "Builtin.Robustness"
                    ]
                }
            ]
        }
    }
)

print(job_request)

AWS CLI

In the AWS CLI, you can use the help command to see which parameters are required, and which parameters are optional when specifying create-evaluation-job in the AWS CLI.


aws bedrock create-evaluation-job help


aws bedrock create-evaluation-job \
--job-name 'automatic-eval-job-cli-001' \
--role-arn 'arn:aws:iam::111122223333:role/role-name' \
--evaluation-config '{"automated": {"datasetMetricConfigs": [{"taskType": "QuestionAndAnswer","dataset": {"name": "Builtin.BoolQ"},"metricNames": ["Builtin.Accuracy","Builtin.Robustness"]}]}}' \
--inference-config '{"models": [{"bedrockModel": {"modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1","inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"}}]}' \
--output-data-config '{"s3Uri":"s3://automatic-eval-jobs/outputs"}'

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prompt datasets

List job