

# Creating an automatic model evaluation job in Amazon Bedrock
<a name="evaluation-automatic"></a>

The topic provides detail directions for creating an automatic model evaluation job.

**Topics**
+ [Required steps prior to creating your first automatic model evaluation job](model-evaluation-type-automatic.md)
+ [Model evaluation task types in Amazon Bedrock](model-evaluation-tasks.md)
+ [Use prompt datasets for model evaluation in Amazon Bedrock](model-evaluation-prompt-datasets.md)
+ [Starting an automatic model evaluation job in Amazon Bedrock](model-evaluation-jobs-management-create.md)
+ [List automatic model evaluation jobs in Amazon Bedrock](model-evaluation-jobs-management-list.md)
+ [Stop a model evaluation job in Amazon Bedrock](model-evaluation-jobs-management-stop.md)
+ [Delete a model evaluation job in Amazon Bedrock](model-evaluation-jobs-management-delete.md)

# Required steps prior to creating your first automatic model evaluation job
<a name="model-evaluation-type-automatic"></a>

Automatic model evaluation jobs require access to the following service level resources. Use the linked topics to learn more about getting setting up.

**Cross Origin Resource Sharing (CORS) permission requirements**  
All console-based model evaluation jobs require Cross Origin Resource Sharing (CORS) permissions to be enabled on any Amazon S3 buckets specified in the model evaluation job. To learn more, see [Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets](model-evaluation-security-cors.md)

**Required service level resources to start an automatic model evaluation job**

1. To start a automatic model evaluation job, you need access to at least one Amazon Bedrock foundation model. To learn more, see [Request access to models](model-access.md).

1. To create an automatic model evaluation job you need access to the [https://console.aws.amazon.com/bedrock/](https://console.aws.amazon.com/bedrock/), AWS Command Line Interface, or a supported AWS SDK. To learn more about the required IAM actions and resources, see [Required console permissions to create an automatic model evaluation job](#base-for-automatic).

1. When the model evaluation job starts, a service role is used to perform actions on your behalf. To learn more about required IAM actions and the trust policy requirements, see [Service role requirements for automatic model evaluation jobs](automatic-service-roles.md).

1. Amazon Simple Storage Service – All data used and generated must be placed in an Amazon S3 bucket that is in the same AWS Region in an automatic model evaluation job.

1. Cross Origin Resource Sharing (CORS) – Automatic model evaluations jobs that are created using the Amazon Bedrock console require that you specify a CORS configuration on the S3 bucket. To learn more, see [Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets](model-evaluation-security-cors.md).

1. An IAM service role – To run an automatic model evaluation job you must create a service role. The service role allows Amazon Bedrock to perform actions on your behalf in your AWS account. To learn more, see [Service role requirements for automatic model evaluation jobs](automatic-service-roles.md). 

## Required console permissions to create an automatic model evaluation job
<a name="base-for-automatic"></a>

The following policy contains the minimum set of IAM actions and resources in Amazon Bedrock and Amazon S3 that are required to create an *automatic* model evaluation job using the Amazon Bedrock console.

In the policy, we recommend using the IAM JSON policy element [Resource](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_resource.html) to limit access to only the models and buckets required for the IAM user, group, or role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowPassingConsoleCreatedServiceRoles",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::111122223333:role/service-role/Amazon-Bedrock-IAM-Role-*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "bedrock.amazonaws.com"
        }
      }
    },
    {
      "Sid": "BedrockConsole",
      "Effect": "Allow",
      "Action": [
        "bedrock:CreateEvaluationJob",
        "bedrock:GetEvaluationJob",
        "bedrock:ListEvaluationJobs",
        "bedrock:StopEvaluationJob",
        "bedrock:GetCustomModel",
        "bedrock:ListCustomModels",
        "bedrock:CreateProvisionedModelThroughput",
        "bedrock:UpdateProvisionedModelThroughput",
        "bedrock:GetProvisionedModelThroughput",
        "bedrock:ListProvisionedModelThroughputs",
        "bedrock:GetImportedModel",
        "bedrock:ListImportedModels",
        "bedrock:ListMarketplaceModelEndpoints",
        "bedrock:ListTagsForResource",
        "bedrock:UntagResource",
        "bedrock:TagResource"
      ],
      "Resource": [
        "arn:aws:bedrock:us-west-2::foundation-model/model-id-of-foundational-model",
        "arn:aws:bedrock:us-west-2:111122223333:inference-profile/*",
        "arn:aws:bedrock:us-west-2:111122223333:provisioned-model/*",
        "arn:aws:bedrock:us-west-2:111122223333:imported-model/*"
      ]
    },
    {
      "Sid": "AllowConsoleS3AccessForModelEvaluation",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetBucketCORS",
        "s3:ListBucket",
        "s3:ListBucketVersions",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::my_output_bucket",
        "arn:aws:s3:::input_datasets/prompts.jsonl"
      ]
    }
  ]
}
```

------

# Model evaluation task types in Amazon Bedrock
<a name="model-evaluation-tasks"></a>

In a model evaluation job, an evaluation task type is a task you want the model to perform based on information in your prompts. You can choose one task type per model evaluation job.

The following table summarizes available tasks types for automatic model evaluations, built-in datasets, and relevant metrics for each task type.


**Available built-in datasets for automatic model evaluation jobs in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-tasks.html)

**Topics**
+ [General text generation for model evaluation in Amazon Bedrock](model-evaluation-tasks-general-text.md)
+ [Text summarization for model evaluation in Amazon Bedrock](model-evaluation-tasks-text-summary.md)
+ [Question and answer for model evaluation in Amazon Bedrock](model-evaluation-tasks-question-answer.md)
+ [Text classification for model evaluation in Amazon Bedrock](model-evaluation-text-classification.md)

# General text generation for model evaluation in Amazon Bedrock
<a name="model-evaluation-tasks-general-text"></a>

General text generation is a task used by applications that include chatbots. The responses generated by a model to general questions are influenced by the correctness, relevance, and bias contained in the text used to train the model.

**Important**  
For general text generation, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully.

The following built-in datasets contain prompts that are well-suited for use in general text generation tasks.

**Bias in Open-ended Language Generation Dataset (BOLD)**  
The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts.

**RealToxicityPrompts**  
RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts.

**T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX)**  
TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is "George Washington was the president of the United States". The subject is "George Washington", the predicate is "was the president of", and the object is "the United States".

**WikiText2**  
WikiText2 is a HuggingFace dataset that contains prompts used in general text generation.

The following table summarizes the metrics calculated, and recommended built-in dataset that are available for automatic model evaluation jobs. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, *Built-in datasets (API)*.


**Available built-in datasets for general text generation in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-tasks-general-text.html)

To learn more about how the computed metric for each built-in dataset is calculated, see [Review model evaluation job reports and metrics in Amazon Bedrock](model-evaluation-report.md)

# Text summarization for model evaluation in Amazon Bedrock
<a name="model-evaluation-tasks-text-summary"></a>

Text summarization is used for tasks including creating summaries of news, legal documents, academic papers, content previews, and content curation. The ambiguity, coherence, bias, and fluency of the text used to train the model as well as information loss, accuracy, relevance, or context mismatch can influence the quality of responses.

**Important**  
For text summarization, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully.

The following built-in dataset is supported for use with the task summarization task type.

**Gigaword**  
The Gigaword dataset consists of news article headlines. This dataset is used in text summarization tasks.

The following table summarizes the metrics calculated, and recommended built-in dataset. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, *Built-in datasets (API)*.


**Available built-in datasets for text summarization in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-tasks-text-summary.html)

To learn more about how the computed metric for each built-in dataset is calculated, see [Review model evaluation job reports and metrics in Amazon Bedrock](model-evaluation-report.md)

# Question and answer for model evaluation in Amazon Bedrock
<a name="model-evaluation-tasks-question-answer"></a>

Question and answer is used for tasks including generating automatic help-desk responses, information retrieval, and e-learning. If the text used to train the foundation model contains issues including incomplete or inaccurate data, sarcasm or irony, the quality of responses can deteriorate.

**Important**  
For question and answer, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully.

The following built-in datasets are recommended for use with the question and answer answer task type.

**BoolQ**  
BoolQ is a dataset consisting of yes/no question and answer pairs. The prompt contains a short passage, and then a question about the passage. This dataset is recommended for use with question and answer task type.

**Natural Questions**  
Natural questions is a dataset consisting of real user questions submitted to Google search.

**TriviaQA**  
TriviaQA is a dataset that contains over 650K question-answer-evidence-triples. This dataset is used in question and answer tasks.

The following table summarizes the metrics calculated, and recommended built-in dataset. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, *Built-in datasets (API)*.


**Available built-in datasets for the question and answer task type in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-tasks-question-answer.html)

To learn more about how the computed metric for each built-in dataset is calculated, see [Review model evaluation job reports and metrics in Amazon Bedrock](model-evaluation-report.md)

# Text classification for model evaluation in Amazon Bedrock
<a name="model-evaluation-text-classification"></a>

Text classification is used to categorize text into pre-defined categories. Applications that use text classification include content recommendation, spam detection, language identification and trend analysis on social media. Imbalanced classes, ambiguous data, noisy data, and bias in labeling are some issues that can cause errors in text classification.

**Important**  
For text classification, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully.

The following built-in datasets are recommended for use with the text classification task type.

**Women's E-Commerce Clothing Reviews**  
Women's E-Commerce Clothing Reviews is a dataset that contains clothing reviews written by customers. This dataset is used in text classification tasks. 

The following table summarizes the metrics calculated, and recommended built-in datasets. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, *Built-in datasets (API)*.




**Available built-in datasets in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-text-classification.html)

To learn more about how the computed metric for each built-in dataset is calculated, see [Review model evaluation job reports and metrics in Amazon Bedrock](model-evaluation-report.md)

# Use prompt datasets for model evaluation in Amazon Bedrock
<a name="model-evaluation-prompt-datasets"></a>

To create an automatic model evaluation job you must specify a prompt dataset. The prompts are then used during inference with the model you select to evaluate. Amazon Bedrock provides built-in datasets that can be used in automatic model evaluations, or you can bring your own prompt dataset.

Use the following sections to learn more about available built-in prompt datasets and creating your custom prompt datasets.

## Use built-in prompt datasets for automatic model evaluation in Amazon Bedrock
<a name="model-evaluation-prompt-datasets-builtin"></a>

Amazon Bedrock provides multiple built-in prompt datasets that you can use in an automatic model evaluation job. Each built-in dataset is based off an open-source dataset. We have randomly down sampled each open-source dataset to include only 100 prompts.

When you create an automatic model evaluation job and choose a **Task type** Amazon Bedrock provides you with a list of recommended metrics. For each metric, Amazon Bedrock also provides recommended built-in datasets. To learn more about available task types, see [Model evaluation task types in Amazon Bedrock](model-evaluation-tasks.md).

**Bias in Open-ended Language Generation Dataset (BOLD)**  
The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts.

**RealToxicityPrompts**  
RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts.

**T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX)**  
TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is "George Washington was the president of the United States". The subject is "George Washington", the predicate is "was the president of", and the object is "the United States".

**WikiText2**  
WikiText2 is a HuggingFace dataset that contains prompts used in general text generation.

**Gigaword**  
The Gigaword dataset consists of news article headlines. This dataset is used in text summarization tasks.

**BoolQ**  
BoolQ is a dataset consisting of yes/no question and answer pairs. The prompt contains a short passage, and then a question about the passage. This dataset is recommended for use with question and answer task type.

**Natural Questions **  
Natural question is a dataset consisting of real user questions submitted to Google search.

**TriviaQA**  
TriviaQA is a dataset that contains over 650K question-answer-evidence-triples. This dataset is used in question and answer tasks.

**Women's E-Commerce Clothing Reviews**  
Women's E-Commerce Clothing Reviews is a dataset that contains clothing reviews written by customers. This dataset is used in text classification tasks. 

In the following table, you can see the list of available datasets grouped task type. To learn more about how automatic metrics are computed, see [Review metrics for an automated model evaluation job in Amazon Bedrock (console)](model-evaluation-report-programmatic.md). 


**Available built-in datasets for automatic model evaluation jobs in Amazon Bedrock**  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-evaluation-prompt-datasets.html)

To learn more about the requirements for creating and examples of custom prompt datasets, see [Use custom prompt dataset for model evaluation in Amazon Bedrock](#model-evaluation-prompt-datasets-custom).

## Use custom prompt dataset for model evaluation in Amazon Bedrock
<a name="model-evaluation-prompt-datasets-custom"></a>

You can create a custom prompt dataset in an automatic model evaluation jobs. Custom prompt datasets must be stored in Amazon S3, and use the JSON line format and use the `.jsonl` file extension. Each line must be a valid JSON object. There can be up to 1000 prompts in your dataset per automatic evaluation job.

For job created using the console you must update the Cross Origin Resource Sharing (CORS) configuration on the S3 bucket. To learn more about the required CORS permissions, see [Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets](model-evaluation-security-cors.md). 

You must use the following keys value pairs in a custom dataset.
+ `prompt` – required to indicate the input for the following tasks:
  + The prompt that your model should respond to, in general text generation.
  + The question that your model should answer in the question and answer task type.
  + The text that your model should summarize in text summarization task.
  + The text that your model should classify in classification tasks.
+ `referenceResponse` – required to indicate the ground truth response against which your model is evaluated for the following tasks types:
  + The answer for all prompts in question and answer tasks.
  + The answer for all accuracy, and robustness evaluations.
+ `category`– (optional) generates evaluation scores reported for each category. 

As an example, accuracy requires both the question asked, and a answer to check the model's response against. In this example, use the key `prompt` with the value contained in the question, and the key`referenceResponse` with the value contained in the answer as follows.

```
{
  "prompt": "Bobigny is the capital of",
  "referenceResponse": "Seine-Saint-Denis",
  "category": "Capitals"
}
```

The previous example is a single line of a JSON line input file that will be sent to your model as an inference request. Model will be invoked for every such record in your JSON line dataset. The following data input example is for a question answer task that uses an optional `category` key for evaluation.

```
{"prompt":"Aurillac is the capital of", "category":"Capitals", "referenceResponse":"Cantal"}
{"prompt":"Bamiyan city is the capital of", "category":"Capitals", "referenceResponse":"Bamiyan Province"}
{"prompt":"Sokhumi is the capital of", "category":"Capitals", "referenceResponse":"Abkhazia"}
```

# Starting an automatic model evaluation job in Amazon Bedrock
<a name="model-evaluation-jobs-management-create"></a>

You can create an automatic model evaluation job using the AWS Management Console, AWS CLI, or a supported AWS SDK. In an automatic model evaluation job, the model you select performs inference using either prompts from a supported built-in dataset or your own custom prompt dataset. Each job also requires you to select a task type. The task type provides you with some recommended metrics, and built-in prompt datasets. To learn more about available task types and metrics, see [Model evaluation task types in Amazon Bedrock](model-evaluation-tasks.md).

The following examples show you how to create an automatic model evaluation job using the Amazon Bedrock console, AWS CLI, SDK for Python.

All automatic model evaluation jobs require that you create an IAM service role. To learn more about the IAM requirements for setting up a model evaluation job, see [Service role requirements for model evaluation jobs](model-evaluation-security-service-roles.md).

The following examples show you how to create an automatic model evaluation job. In the API, you can also include an [inference profile](cross-region-inference.md) in the job by specifying its ARN in the `modelIdentifier` field.

------
#### [ Amazon Bedrock console ]

Use the following procedure to create a model evaluation job using the Amazon Bedrock console. To successfully complete this procedure make sure that your IAM user, group, or role has the sufficient permissions to access the console. To learn more, see [Required console permissions to create an automatic model evaluation job](model-evaluation-type-automatic.md#base-for-automatic). 

Also, any custom prompt datasets that you want to specify in the model evaluation job must have the required CORS permissions added to the Amazon S3 bucket. To learn more about adding the required CORS permissions see, [Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets](model-evaluation-security-cors.md).

**To create a automatic model evaluation job**

1. Open the Amazon Bedrock console: [https://console.aws.amazon.com/bedrock/home](https://console.aws.amazon.com/bedrock/home)

1. In the navigation pane, choose **Model evaluation**.

1. In the **Build an evaluation** card, under **Automatic** choose **Create automatic evaluation**.

1. On the **Create automatic evaluation** page, provide the following information

   1. **Evaluation name** — Give the model evaluation job a name that describes the job. This name is shown in your model evaluation job list. The name must be unique in your account in an AWS Region.

   1. **Description** (Optional) — Provide an optional description.

   1. **Models** — Choose the model you want to use in the model evaluation job.

      To learn more about available models and accessing them in Amazon Bedrock, see [Request access to models](model-access.md).

   1. (Optional) To change the inference configuration choose **update**.

      Changing the inference configuration changes the responses generated by the selected models. To learn more about the available inferences parameters, see [Inference request parameters and response fields for foundation models](model-parameters.md).

   1. **Task type ** — Choose the type of task you want the model to attempt to perform during the model evaluation job.

   1. **Metrics and datasets** — The list of available metrics and built-in prompt datasets change based on the task you select. You can choose from the list of **Available built-in datasets** or you can choose **Use your own prompt dataset**. If you choose to use your own prompt dataset, enter the exact S3 URI of your prompt dataset file or choose **Browse S3 **to search for your prompt data set.

   1. **Evaluation results** —Specify the S3 URI of the directory where you want the results saved. Choose **Browse S3** to search for a location in Amazon S3.

   1. (Optional) To enable the use of a customer managed key Choose **Customize encryption settings (advanced)**. Then, provide the ARN of the AWS KMS key you want to use.

   1. **Amazon Bedrock IAM role** — Choose **Use an existing role** to use IAM service role that already has the required permissions, or choose **Create a new role** to create a new IAM service role.

1. Then, choose **Create**.

Once the status changes **Completed**, then you can view the job's report card.

------
#### [ SDK for Python ]

The following example creates an automatic evaluation job using Python.

```
import boto3
client = boto3.client('bedrock')

job_request = client.create_evaluation_job(
    jobName="api-auto-job-titan",
    jobDescription="two different task types",
    roleArn="arn:aws:iam::111122223333:role/role-name",
    inferenceConfig={
        "models": [
            {
                "bedrockModel": {
                    "modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1",
                    "inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"
                }

            }
        ]

    },
    outputDataConfig={
        "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/"
    },
    evaluationConfig={
        "automated": {
            "datasetMetricConfigs": [
                {
                    "taskType": "QuestionAndAnswer",
                    "dataset": {
                        "name": "Builtin.BoolQ"
                    },
                    "metricNames": [
                        "Builtin.Accuracy",
                        "Builtin.Robustness"
                    ]
                }
            ]
        }
    }
)

print(job_request)
```

------
#### [ AWS CLI ]

In the AWS CLI, you can use the `help` command to see which parameters are required, and which parameters are optional when specifying `create-evaluation-job` in the AWS CLI.

```
aws bedrock create-evaluation-job help
```

```
aws bedrock create-evaluation-job \
--job-name 'automatic-eval-job-cli-001' \
--role-arn 'arn:aws:iam::111122223333:role/role-name' \
--evaluation-config '{"automated": {"datasetMetricConfigs": [{"taskType": "QuestionAndAnswer","dataset": {"name": "Builtin.BoolQ"},"metricNames": ["Builtin.Accuracy","Builtin.Robustness"]}]}}' \
--inference-config '{"models": [{"bedrockModel": {"modelIdentifier":"arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-lite-v1","inferenceParams":"{\"inferenceConfig\":{\"maxTokens\": 512,\"temperature\":0.7,\"topP\":0.9}}"}}]}' \
--output-data-config '{"s3Uri":"s3://automatic-eval-jobs/outputs"}'
```

------

# List automatic model evaluation jobs in Amazon Bedrock
<a name="model-evaluation-jobs-management-list"></a>

You can list your current automatic model evaluation jobs that you've already created using the AWS CLI, or a supported AWS SDK. In the Amazon Bedrock console, you can also view a table containing your current model evaluation jobs.

The following examples show you how to find your model evaluation jobs using the AWS Management Console, AWS CLI and SDK for Python.

------
#### [ Amazon Bedrock console ]

1. Open the Amazon Bedrock console: [https://console.aws.amazon.com/bedrock/home](https://console.aws.amazon.com/bedrock/home)

1. In the navigation pane, choose **Model evaluation**.

1. In the **Model Evaluation Jobs** card, you can find a table that lists the model evaluation jobs you have already created.

------
#### [ AWS CLI ]

In the AWS CLI, you can use the `help` command to view parameters are required, and which parameters are optional when using `list-evaluation-jobs`.

```
aws bedrock list-evaluation-jobs help
```

The follow is an example of using `list-evaluation-jobs` and specifying that maximum of 5 jobs be returned. By default jobs are returned in descending order from the time when they where started.

```
aws bedrock list-evaluation-jobs --max-items 5
```

------
#### [ SDK for Python ]

The following examples show how to use the AWS SDK for Python to find a model evaluation job you have previously created. 

```
import boto3
client = boto3.client('bedrock')

job_request = client.list_evaluation_jobs(maxResults=20)

print (job_request)
```

------

# Stop a model evaluation job in Amazon Bedrock
<a name="model-evaluation-jobs-management-stop"></a>

You can stop a model evaluation job that is currently processing using the AWS Management Console, AWS CLI, or a supported AWS SDK.

The following examples show you how to stop a model evaluation job using the AWS Management Console, AWS CLI, and SDK for Python

------
#### [ Amazon Bedrock console ]

The following example shows you how to stop a model evaluation job using the AWS Management Console

1. Open the Amazon Bedrock console: [https://console.aws.amazon.com/bedrock/home](https://console.aws.amazon.com/bedrock/home)

1. In the navigation pane, choose **Model evaluation**.

1. In the **Model Evaluation Jobs** card, you can find a table that lists the model evaluation jobs you have already created.

1. Select the radio button next to your job's name.

1. Then, choose **Stop evaluation**.

------
#### [ SDK for Python ]

The following example shows you how to stop a model evaluation job using the SDK for Python

```
import boto3
client = boto3.client('bedrock')
response = client.stop_evaluation_job(
	## The ARN of the model evaluation job you want to stop.
	jobIdentifier='arn:aws:bedrock:us-west-2:444455556666:evaluation-job/fxaqujhttcza'
)

print(response)
```

------
#### [ AWS CLI ]

In the AWS CLI, you can use the `help` command to see which parameters are required, and which parameters are optional when specifying `add-something` in the AWS CLI.

```
aws bedrock create-evaluation-job help
```

The following example shows you how to stop a model evaluation job using the AWS CLI

```
aws bedrock stop-evaluation-job --job-identifier arn:aws:bedrock:us-west-2:444455556666:evaluation-job/fxaqujhttcza
```

------

# Delete a model evaluation job in Amazon Bedrock
<a name="model-evaluation-jobs-management-delete"></a>

You can delete a model evaluation job by using the Amazon Bedrock console, or by using the [BatchDeleteEvaluationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_BatchDeleteEvaluationJob.html) operation with the AWS CLI, or a supported AWS SDK. 

Before you can delete a model evaluation job, the status of the job must be `FAILED`, `COMPLETED`, or `STOPPED`. You can get the current status for a job from the Amazon Bedrock console or by calling the [ ListEvaluationJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListEvaluationJobs.html). For more information, see [List automatic model evaluation jobs in Amazon BedrockList model evaluation jobs that use human workers in Amazon Bedrock](model-evaluation-jobs-management-list.md). 

You can delete up to 25 model evaluation jobs at a time with the console and with the `BatchDeleteEvaluationJob` operation. If you need to delete more jobs, repeat the console procedure or `BatchDeleteEvaluationJob` call.

If you delete a model evaluation job with the `BatchDeleteEvaluationJob` operation, you need the Amazon Resource Names (ARNs) of the models that you want to delete. For information about getting the ARN for a model, see [List automatic model evaluation jobs in Amazon BedrockList model evaluation jobs that use human workers in Amazon Bedrock](model-evaluation-jobs-management-list.md). 

When you delete a model evaluation job all resources in Amazon Bedrock and Amazon SageMaker AI are removed. Any model evaluation job saved in Amazon S3 buckets are left unchanged. Also, for model evaluation jobs that use human workers, deleting a model evaluation job will not delete the workforce or workteam you have configured in Amazon Cognito or SageMaker AI.

Use the following sections to see examples of how to delete a model evaluation job.

------
#### [ Amazon Bedrock console ]

Use the following procedure to delete model evaluation job using the Amazon Bedrock console. To successfully complete this procedure make sure that your IAM user, group, or role has the sufficient permissions to access the console. To learn more, see [Required console permissions to create an automatic model evaluation job](model-evaluation-type-automatic.md#base-for-automatic).

**To delete multiple model evaluation jobs.**

1. Open the Amazon Bedrock console: [https://console.aws.amazon.com/bedrock/](https://console.aws.amazon.com/bedrock/)

1. In the navigation pane, choose **Model evaluation**.

1. In the **Model Evaluation Jobs** card, use the table to find the model evaluation jobs that you want to delete, select them using the checkbox next to the job's name. You can select up to 25 jobs.

1. Choose **Delete** to delete the model evaluation jobs.

1. If you need to delete more model evaluation jobs, repeat steps 3 and 4.

------
#### [ AWS CLI ]

In the AWS CLI, you can use the `help` command to view parameters are required, and which parameters are optional when using `batch-delete-evaluation-job`.

```
aws bedrock batch-delete-evaluation-job help
```

The follow is an example of using `batch-delete-evaluation-job` and specifying that 2 model evaluation jobs be deleted. You use the `job-identifiers` parameter to specify a list of ARNS for the model evaluation jobs that you want to delete. You can delete up to 25 model evaluation jobs in a single call to `batch-delete-evaluation-job`. If you need to delete more jobs, make further calls to `batch-delete-evaluation-job`.

```
aws bedrock batch-delete-evaluation-job \
--job-identifiers arn:aws:bedrock:us-east-1:111122223333:evaluation-job/rmqp8zg80rvg arn:aws:bedrock:us-east-1:111122223333:evaluation-job/xmfp9zg204fdk
```

After submitting you would get the following response.

```
{
	"evaluationJobs": [
		{
			"jobIdentifier": "rmqp8zg80rvg",
			"jobStatus": "Deleting"
		},
		{
			"jobIdentifier": "xmfp9zg204fdk",
			"jobStatus": "Deleting"
		}

	],
	"errors": []
}
```

------
#### [ SDK for Python ]

The following examples show how to use the AWS SDK for Python to delete a model evaluation job. Use the `jobIdentifiers` parameter to specify a list of ARNS for the model evaluation jobs that you want to delete. You can delete up to 25 model evaluation jobs in a single call to `BatchDeleteEvaluationJob`. If you need to delete more jobs, make further calls to `BatchDeleteEvaluationJob`.

```
import boto3
client = boto3.client('bedrock')

job_request = client.batch_delete_model_evaluation_job(jobIdentifiers=["arn:aws:bedrock:us-east-1:111122223333:evaluation-job/rmqp8zg80rvg", "arn:aws:bedrock:us-east-1:111122223333:evaluation-job/xmfp9zg204fdk"])

print (job_request)
```

------