Getting started with model evaluations - Amazon Bedrock

Getting started with model evaluations

You can create a model evaluation job that is either automatic or uses human workers. When you create a model evaluation job, you can define the model used, the inference parameters of the model, the type of task the model tries to perform, and the prompt data used in the job.

Model evaluation jobs support the following task types.
  • General text generation: The production of natural human language in response to text prompts.

  • Text summarization: The model selected is asked to summarized the text provided in the prompt.

  • Question and answering: The model selected is tasked with responding to the question in the prompt.

  • Classification: The model attempts to correctly assign a category, such as a label or score, to the text based on its content. In custom data sets you can specify a ground truth response.

  • Custom: You define the metric, description, and a rating method.

To create a model evaluation job, you must have access to at least one Amazon Bedrock model. Model evaluation jobs support using Amazon Bedrock foundation models. To learn more about which models are supported in model evaluations, see Model support by feature. To gain access to models in Amazon Bedrock, see Manage access to Amazon Bedrock foundation models.

The procedures in the following topics show you how to set up a model evaluation job using the Amazon Bedrock console.

To create a model evaluation job with the help of an AWS-managed team, choose Create AWS managed evaluation from the AWS Management Console. Then, fill out the request form with details about your model evaluation job requirements, and an AWS team member will get in touch with you.