Starting an automatic model
evaluation job in Amazon Bedrock
You can create an automatic model evaluation job using the AWS Management Console, AWS CLI, or a
supported AWS SDK. In an automatic model evaluation job, the model you select performs
inference using either prompts from a supported built-in dataset or your own custom prompt
dataset. Each job also requires you to select a task type. The task type provides you with
some recommended metrics, and built-in prompt datasets. To learn more about available task
types and metrics, see Model evaluation task types in Amazon Bedrock.
The following examples show you how to create an automatic model evaluation job using the
Amazon Bedrock console, AWS CLI, SDK for Python.
All automatic model evaluation jobs require that you create an IAM service role. To
learn more about the IAM requirements for setting up a model evaluation job, see Service role requirements for model evaluation jobs.
The following examples show you how to create an automatic model evaluation job. In the
API, you can also include an inference profile
in the job by specifying its ARN in the modelIdentifier
field.
- Amazon Bedrock console
-
Use the following procedure to create a model evaluation job using the Amazon Bedrock
console. To successfully complete this procedure make sure that your IAM user,
group, or role has the sufficient permissions to access the console. To learn
more, see Required console permissions to create an automatic
model evaluation job.
Also, any custom prompt datasets that you want to specify in the model
evaluation job must have the required CORS permissions added to the Amazon S3 bucket.
To learn more about adding the required CORS permissions see, Required Cross Origin Resource Sharing
(CORS) permissions on S3 buckets.
To create a automatic model evaluation job
-
Open the Amazon Bedrock console: https://console.aws.amazon.com/bedrock/
-
In the navigation pane, choose Model
evaluation.
-
In the Build an evaluation card, under
Automatic choose Create automatic
evaluation.
-
On the Create automatic evaluation page, provide
the following information
-
Evaluation name — Give the model
evaluation job a name that describes the job. This name is shown
in your model evaluation job list. The name must be unique in
your account in an AWS Region.
-
Description (Optional) — Provide
an optional description.
-
Models — Choose the model you want
to use in the model evaluation job.
To learn more about available models and accessing them in
Amazon Bedrock, see Access Amazon Bedrock foundation models.
-
(Optional) To change the inference configuration choose
update.
Changing the inference configuration changes the responses
generated by the selected models. To learn more about the
available inferences parameters, see Inference request parameters and response fields for foundation models.
-
Task type — Choose the type of
task you want the model to attempt to perform during the model
evaluation job.
-
Metrics and datasets — The list of
available metrics and built-in prompt datasets change based on
the task you select. You can choose from the list of
Available built-in datasets or you can
choose Use your own prompt dataset. If you
choose to use your own prompt dataset, enter the exact S3 URI of
your prompt dataset file or choose Browse
S3 to search for your prompt data set.
-
>Evaluation results —Specify the
S3 URI of the directory where you want the results saved. Choose
Browse S3 to search for a
location in Amazon S3.
-
(Optional) To enable the use of a customer managed key Choose
Customize encryption settings
(advanced). Then, provide the ARN of the AWS KMS
key you want to use.
-
Amazon Bedrock IAM role — Choose Use an existing role to use IAM
service role that already has the required permissions, or
choose Create a new role to
create a new IAM service role.
-
Then, choose Create.
Once the status changes Completed, then you can view the
job's report card.
- SDK for Python
-
The following example creates an automatic evaluation job using
Python.
import boto3
client = boto3.client('bedrock')
job_request = client.create_evaluation_job(
jobName="api-auto-job-titan
",
jobDescription="two different task types",
roleArn="arn:aws:iam::111122223333
:role/role-name
",
inferenceConfig={
"models": [
{
"bedrockModel": {
"modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1",
"inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"
}
}
]
},
outputDataConfig={
"s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/
"
},
evaluationConfig={
"automated": {
"datasetMetricConfigs": [
{
"taskType": "QuestionAndAnswer",
"dataset": {
"name": "Builtin.BoolQ"
},
"metricNames": [
"Builtin.Accuracy",
"Builtin.Robustness"
]
}
]
}
}
)
print(job_request)
- AWS CLI
-
In the AWS CLI, you can use the help
command to see which
parameters are required, and which parameters are optional when specifying
create-evaluation-job
in the AWS CLI.
aws bedrock create-evaluation-job help
aws bedrock create-evaluation-job \
--job-name 'automatic-eval-job-cli-001
' \
--role-arn 'arn:aws:iam::111122223333
:role/role-name
' \
--evaluation-config '{"automated": {"datasetMetricConfigs": [{"taskType": "QuestionAndAnswer","dataset": {"name": "Builtin.BoolQ"},"metricNames": ["Builtin.Accuracy","Builtin.Robustness"]}]}}' \
--inference-config '{"models": [{"bedrockModel": {"modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1","inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"}}]}' \
--output-data-config '{"s3Uri":"s3://automatic-eval-jobs/outputs
"}'