

# Evaluating Lex V2 bot performance with the Test Workbench
<a name="test-workbench"></a>

To improve bot performance, you can evaluate the performance of your bots at scale. The results for your test evaluation are displayed in simple tables and charts.

You can use the Test Workbench to create reference test sets that use existing transcription data. You can test bots to evaluate performance before deployment, and view test result breakdowns at scale.

![\[The work flow diagram to improve bot accuracy with the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/testworkbench-workflow.png)


Users can use the Test Workbench to establish baseline performance for bots. This covers intent and slot performance for utterances that are in the form of single-inputs or conversations. Once a test set is successfully loaded, you can run it against your existing pre-production or production bots. The Test Workbench helps you identify opportunities for improved slot filling and intent classification.

**Topics**
+ [Generate a test set for Test Workbench](test-sets.md)
+ [Manage test sets](manage-test-sets.md)
+ [Execute a test](execute-test-set.md)
+ [Test set coverage in Test Workbench](validation-test-set.md)
+ [View test results](test-results-test-set.md)
+ [Test results details in Test Workbench](test-results-details-test-set.md)

# Generate a test set for Test Workbench
<a name="test-sets"></a>

You can create a test set to evaluate the performance of your bot. Generate a test set by uploading a test set that is in a CSV file format or by generating a test set from [ conversation logs](https://docs.aws.amazon.com/lexv2/latest/dg/conversation-logs.html). The test set can contain audio or text input.

![\[Create a test set with the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-create.png)


If a test set creates validation errors, remove the test set and replace it with another list of test set data, or edit the data in the CSV file by using a spreadsheet editing program.

**To create a test set:**

1. Sign in to the AWS Management Console and open the Amazon Lex console at [https://console.aws.amazon.com/lex/](https://console.aws.amazon.com/lex/).

1. Choose **Test workbench** from the left side panel.

1. Select **Test sets** from the options under Test workbench. 

1. Select the **Create test set** button on the console. 

1. In the **Details**, enter a test set name and an optional description. 

1. Select **Generate a baseline test set**. 

1. Select **Generate from conversation logs**. 

1. Select **Bot name**, **Bot Alias**, and **Language** from the drop down menus. 

1. If you are generating a baseline test from a conversation log, choose **Time range** and **IAM role**, if required. You can create a role with the basic Amazon Lex V2 permissions or use an existing role.

1. Choose a modality of **Audio** or **Text** for the test set you are creating. NOTE: The Test Workbench can import text files up to 50k, and up to 5 hours of audio. 

1. Select an Amazon S3 location to store your test results, and add an optional KMS key to encrypt output transcripts. 

1. Select **Create**. 

**To upload an existing test set in a CSV file format, or to update the test set:**

1. Choose **Test workbench** from the left side panel.

1. Select **Test sets** from the options under Test workbench. 

1. Select **Upload a file to this test set** on the console. 

1. Choose **Upload from Amazon Amazon S3 bucket** or **Upload from your computer**. NOTE: You can upload a CSV file created from a template. Click **CSV template** to download a zip file that contains the templates. 

1. Choose **Create a role with basic Amazon Lex permissions** or **Use an existing role for Role ARN**. 

1. Choose a modality of **Audio** or **Text** for the test set you are creating. NOTE: The Test Workbench can import text files up to 50k, and up to 5 hours of audio. 

1. Select an Amazon S3 location to store your test results, and add an optional KMS key to encrypt output transcripts. 

1. Select **Create**. 

If the operation is successful, the confirmation message will indicate that the test set is ready to test, and the status will display **Ready for testing**. 

# Tips for creating a successful test set
<a name="tips-create-test-set"></a>
+ You can create an IAM role for the Test Workbench in the console, or you can configure your IAM role step-by-step. For more information, see [ Create an IAM role for the Test Workbench](https://docs.aws.amazon.com/lexv2/latest/dg/create-iam-test-set.html). 
+ Before executing a test, validate the test set and the bot definition for any inconsistencies using the **Validate discrepancy** button. If the intent and slot naming conventions used in the test set are consistent with the bot, proceed to execute the test. If any anomalies are identified, revise the test set, update the test set, and choose **Validate discrepancy**. Repeat this sequence again until no inconsistencies are noted, then execute the test. 
+ The Test Workbench can test with different slot value formats in the **Expected Output Slot** column. For any built-in slot, you can choose the value provided in the user input (for example, Date = tomorrow), or provide its absolute resolved value (for example, Date = 2023-03-21). For more information around built-in slots and their absolute values, see [ Built-in slots](https://docs.aws.amazon.com/lexv2/latest/dg/howitworks-builtins-slots.html). 
+  For consistency and readability in the **Expected Output Slot** columns, follow the convention of "SlotName = SlotValue" (for example, AppointmentType = cleaning) with a space before and after the equal sign.
+ If the bot includes composite slots, in **Expected Output Slot** define subslots to the slot name, separated by a period (for example, “Car.Color”). No other syntax and punctuation will work.
+ If the bot includes multi-value slots, in **Expected Output Slot** provide multiple slot values, separated by a comma ("FlowerType = roses, lilies"). No other syntax and punctuation will work.
+ Make sure that the test set is created from valid conversation logs. 
+ Slot:slot value will be in the same column after the intent columns in the CSV format. 
+ DTMF input from a User turn is interpreted as an expected transcription and does not list an Amazon S3 location. 

# Creating a test case within a test set using Test Workbench
<a name="create-test-case"></a>

The Test Workbench results are dependent on the bot definition and its corresponding test set. You can generate a test set with the information from the bot definition to pinpoint areas that need improvement. Create a test dataset with examples that you suspect (or know) will be challenging for the bot to interpret correctly considering the current bot design and your knowledge of your customer conversations.

 Review your intents based on learnings from your production bot on a regular basis. Continue to add to and adjust the bot’s sample utterances and slot values. Consider improving slot resolution by using the available options, such as runtime hints. The design and development of your bot is an iterative process that is a continuous cycle.

 Here are some other tips for optimizing your test set: 
+ Select the most common use cases with frequently used intents and slots in the test set. 
+ Explore different ways a customer could refer to your intents and slots. This can include user inputs in the forms of statements, questions, and commands that vary in length from minimal to extended.
+ Include user inputs with a varied number of slots.
+ Include commonly used synonyms or abbreviations of custom slot values supported by your bot (for example, “root canal”, “canal”, or “RC”).
+ Include variations of built-in slot values (for example, “tomorrow”, “asap”, or "the next day").
+ Examine the bot robustness for spoken modality by collecting user inputs that can be misinterpreted (for example, “ink”, “ankle”, or "anchor").

# Creating a test set from a CSV file for Test Workbench
<a name="create-test-set-from-CSV"></a>

You can create a test set from the CSV file template provided in the Amazon Lex V2 console by entering the values directly by using a CSV spreadsheet editor. The test set is a comma-separated value (CSV) file consisting of single user utterances and multi-turn conversations recorded in the following columns:
+ **Line \$1** – this column is an incremental counter that keeps track of the total filled rows to test. 
+ **Conversation \$1** – this column tracks the number of turns in a conversations. For single inputs, this column can be left empty, filled with "-" or "N/A". For conversations, each turn within a conversations will be assigned the same conversation number. 
+ **Source** – this column is set to "User" or "Agent". For single inputs, it will be always set to "User".
+ **Input** – this column includes the user utterance or the bot prompts.
+ **Expected Output Intent** – this column captures the intent fulfilled in the input.
+ **Intent Expected Output Slot 1** – this column captures the first slot elicited in the user input. The test set should include a column called Expected Output Slot X for each slot in the user input. 

Example of a test set with single inputs:


| Line \$1 | Conversation \$1 | Source | Input | Expected Output Intent | Expected Output Slot 1 | Expected Output Slot 2 | 
| --- | --- | --- | --- | --- | --- | --- | 
|  1  |    | User | book a cleaning appointment tomorrow | MakeAppointment | AppointmentType = cleaning | Date = tomorrow | 
|  2  |  N/A  | User | book a cleaning appointment on April 15th | MakeAppointment | AppointmentType = cleaning | Date = 4/15/23 | 
|  3  |  N/A  | User | book appointment for December first | MakeAppointment | Date = December first |  | 
|  4  |  N/A  | User | book a cleaning appointment | MakeAppointment | AppointmentType = cleaning |  | 
|  1  |    | User | Can you help me book an appointment? | MakeAppointment |  |  | 

Example of a test set with conversations


| Line \$1 | Conversation \$1 | Source | Input | Expected Output Intent | Expected Output Slot 1 | Expected Output Slot 2 | Expected Output Slot 3 | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
|  1  |  1  | User | book an appointment | MakeAppointment |  |  |  | 
|  2  |  1  | Agent | What type of appointment would you like to schedule? | MakeAppointment |  |  |  | 
|  3  |  1  | User | cleaning | MakeAppointment | AppointmentType = cleaning |  |  | 
|  4  |  1  | Agent | When should I schedule your appointment? | MakeAppointment |  |  |  | 
|  5  |  1  | User | tomorrow | MakeAppointment |  | Date = tomorrow |  | 
|  6  |  2  | User | book a root canal appointment today | MakeAppointment | AppointmentType = root canal | Date = today |  | 
|  7  |  2  | Agent | At what time should I schedule your appointment? | MakeAppointment |  |  |  | 
|  8  |  2  | User | eleven a.m. | MakeAppointment |  |  | Time = eleven a.m. | 

# Create an IAM role for the Test Workbench
<a name="create-iam-test-set"></a>

**To create an IAM role for the Test Workbench**

1. Follow the steps at [ Create an IAM user](https://docs.aws.amazon.com/lexv2/latest/dg/gs-account.html#gs-account-user) to create an IAM user which can be used to access test-workbench console.

1. Select the **Create role** button.   
![\[The roles screen in the IAM console.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/testworkbench-iam1.png)

1. Select the option for **Custom trust policy**.   
![\[Select trusted entity\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/testworkbench-iam2.png)

1. Enter the trust policy below and click **Next**. 

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "sid4",
         "Effect": "Allow",
         "Principal": {
           "Service": "lexv2.amazonaws.com"
         },
         "Action": "sts:AssumeRole"
       }
     ]
   }
   ```

------

1. Select the **Create policy** button. 

1. A new tab will open in your browser where you can enter the below policy and click on **Next: Tags** button. 

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:*"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "logs:FilterLogEvents"
               ],
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": [
                   "lex:*"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Enter a policy name, for example ‘LexTestWorkbenchPolicy’ and then click on the **Create Policy**. 

1. Return to the previous tab in your browser and Refresh list of policies by clicking the **Refresh** button as shown below.   
![\[Refresh the screen to see the new policy.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/testworkbench-iam3.png)

1. Search in list of policies by entering policy name that you used in the 6th step and choose the policy. 

1. Select the **Next** button. 

1. Enter role name and then click the **Create Role** button. 

1. Choose your new IAM role when prompted in the Amazon Lex V2 console for Test Workbench. 

# Create an IAM role for the Test Workbench - Advanced Features
<a name="create-iam-test-set-features"></a>

**Permission setup for Test workbench IAM role**

This section shows several example AWS Identity and Access Management (IAM) identity-based policies to implement least-privilege access controls for Test Workbench permissions.

1. **Policy for Test Workbench to read audio files in S3** – This policy enables Test Workbench to read audio files being used in the test sets. The below policy should be accordingly modified to update *S3BucketName* and *S3Path* to point them to an Amazon S3 location of the audio files in a test set.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "TestWorkbenchS3AudioFilesReadOnly",
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:GetObjectVersion"
         ],
         "Resource": [
           "arn:aws:s3:::S3BucketName/S3Path/*"
         ]
       }
     ]
   }
   ```

------

1. **Policy for Test Workbench to read and write test sets and results into an Amazon S3 bucket** – This policy enables Test Workbench to store the test set inputs and results. The below policy should be modified to update *S3BucketName* to the Amazon S3 Bucket where test-set data will be stored. Test Workbench stores these data exclusively in your Amazon S3 bucket and not in the Lex Service infrastructure. Therefore For this reason, Test Workbench requires access to your Amazon S3 bucket to function properly.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "TestSetDataUploadWithEncryptionOnly",
         "Effect": "Allow",
         "Action": [
           "s3:PutObject"
         ],
         "Resource": [
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_set/*",
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_execution/*",
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_set_discrepancy_report/*"
         ],
         "Condition": {
           "StringEquals": {
             "s3:x-amz-server-side-encryption": "aws:kms"
           }
         }
       },
       {
         "Sid": "TestSetDataGetObject",
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:GetObjectVersion"
         ],
         "Resource": [
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_set/*",
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_execution/*",
           "arn:aws:s3:::S3BucketName/*/lex_testworkbench/test_set_discrepancy_report/*"
         ]
       },
       {
         "Sid": "TestSetListS3Objects",
         "Effect": "Allow",
         "Action": [
           "s3:ListBucket"
         ],
         "Resource": [
           "arn:aws:s3:::S3BucketName"
         ]
       }
     ]
   }
   ```

------

1. **Policy for Test Workbench to read CloudWatch Logs** – This policy enables Test Workbench to generate test-sets from Lex Conversation Text Logs stored in Amazon CloudWatch Logs. The below policy should be modified to update *Region*, *AwsAccountId*, *LogGroupName*. 

1. **Policy for Test Workbench to call Lex Runtime** – This policy enables Test Workbench to execute a test set against Lex bots. The below policy should be modified to update *Region*, *AwsAccountId*, *BotId*. Since Test Workbench can test any bot in your Lex environment, you can replace the resource with "arn:aws:lex:*Region*:*AwsAccountId*:bot-alias/\$1" to allow Test Workbench access to all Amazon Lex V2 bots in an account.

1. **(Optional) Policy for Test Workbench to encrypt and decrypt test set data** – If Test Workbench is configured to store test-set inputs and results in Amazon S3 buckets using a customer managed KMS key, Test Workbench will need both encryption and decryption permission to the KMS key. The below policy should be modified to update *Region*, *AwsAccountId*, and *KmsKeyId* where *KmsKeyId* is the ID of the customer managed KMS key.

1. **(Optional) Policy for Test Workbench to decrypt audio files** – If Audio files are stored in the S3 bucket using customer managed KMS key, Test Workbench will need decryption permission to the KMS keys. The below policy should be modified to update *Region*, *AwsAccountId*, and *KmsKeyId* where *KmsKeyId* is the ID of the customer managed KMS key used to encrypt the audio files.

# Manage test sets
<a name="manage-test-sets"></a>

You can download, update, and delete test sets from the test set window. Or you can use the list of available test sets to edit or manually annotate your test set file. Then, upload it again to retry validation, due to errors or other input issues.

**To download the test set file from test set record:**

1. Select the name of the test set from the list of test sets.

1. In the test set record window, select the **Download** button on the right side of the screen in the **Test Inputs** section.

1. if there are any validation error details at the top of the window regarding the test set, select the **Download** button. The file will be saved to your Downloads folder. You can fix the validation errors in the test set from the error messages in the test set CSV file. Find the error identified in the validation step, fix the line or remove it, and upload the file to retry the validation step. 

1. if you successfully download the test set, a green banner messages will appear.

**To download a test set from the list of test sets:**

1. From the list of test sets, select the radio button next to the test set item you want to download.

1. From the Action menu at the top right, choose **Download**.

1. A green banner message will indicate if you successfully have downloaded the test set. The file will be saved to your Downloads folder.

# Test set columns supported by Test Workbench
<a name="file-input-test-sets"></a>

Below is the complete list of test set columns supported by Test Workbench and the instruction on how to use them with Amazon Lex V2.


| Column Name | Test set type | Value Type | Multiple Columns Support | Description | 
| --- | --- | --- | --- | --- | 
|  Line Number  |  Text and Audio  | Number | No | This is a user column which is ignored by Amazon Lex V2. It is intended to help a test set author to sort and filter the test-set rows. "Line \$1" can be used as an alternative column name.  | 
|  Conversation Number  |  Text and Audio  | Number | No | This column allows you to put rows in a conversation together. "Conversation \$1" can be used as an alternative column name.  | 
|  Source  |  Text and Audio  | Enum ("User", "Agent") | No | The value in this column indicates if the row is for a user or an agent. "Conversation Participant" can be used as an alternative column name.  | 
|  Input  |  Text  | String | No | This column is used to add the transcript for text test set. Text input should be used in User rows. The Agent prompt should be used in Agent rows. | 
|  Expected Transcription  |  Audio  | String | No | This column is used to add the transcript for the audio test set. Expected transcription of the audio file should be used in User rows with audio input. DTMF input can be used in User rows with DTMF input. The Agent prompt should be used in Agent rows. | 
|  S3 Audio Location  |  Audio  | String | No | This column is used to add the audio file location and is applicable only to audio test sets. The S3 path should be used in the User rows with the audio input. This field should be left empty in User rows with DTMF input. This field should be left empty in Agent rows. | 
|  Input Context Tag  |  Text and Audio  | String | Yes | This column is used to provide name of an input context which will be used in input to Lex while executing the row in the test set. This refers to input context in [Setting intent context for your Lex V2 bot](https://docs.aws.amazon.com/lexv2/latest/dg/context-mgmt-active-context.html). Note that Test Workbench only supports name of context. It does not support the parameters in a context. Multiple columns named such as 'Input Context Tag 1', 'Input Context Tag 2', and so on, may be used. | 
|  Request Attribute  |  Text and Audio  | String | Yes | This column is used to provide a request attribute which will be used in input to Lex while executing the row in the test set. Value in a column should be provided in format `<request-attribute-name> = <request-attribute-value>`. Spaces can be added around '=' for readability. For example: request-attribute-foo = this is a dummy response request-attribute-foo = 'this is a "dummy response"' request-attribute-foo = "this is a 'dummy response'". Multiple columns named such as 'Request Attribute 1', 'Request Attribute 2', and so on, may be used. | 
|  Session Attribute  |  Text and Audio  | String | Yes | This column is used to provide a session attribute which will be used in input to Lex while executing the row in the test set. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lexv2/latest/dg/file-input-test-sets.html)  | 
|  RunTime Hint  |  Text and Audio  | String | Yes | This column is used to provide a Runtime Hint for a slot within an intent which will be used in input to Lex while executing the row in the test set. Below are examples: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lexv2/latest/dg/file-input-test-sets.html)  | 
|  Barge In  |  Audio  | Boolean | No | This column is used specify if Test Workbench should barge-in when sending audio file to Lex Runtime for the row in the test set. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lexv2/latest/dg/file-input-test-sets.html)  | 
|  Expected Output Intent  |  Text and Audio  | String | No | This column is used specify name of an intent expected in output from Lex for the row in the test set. | 
|  Expected Output Slot  |  Text and Audio  | String | Yes | This column is used to provide a slot value expected in output from Lex while executing the row in the test set. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lexv2/latest/dg/file-input-test-sets.html)  | 
|  Expected Output Context Tag  |  Text and Audio  | String | Yes | This column is used to specify name of an output context expected in output from Lex for the row in the test set. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lexv2/latest/dg/file-input-test-sets.html)  | 

# View test validation errors in test workbench
<a name="view-errors-test-sets"></a>

You can correct test sets that report validation errors. These validation errors are generated when a test set is not ready to be tested. The Test Workbench can show you which required columns in the test set input CSV file did not have a value in the expected format.

**To view test validation errors:**

1. From the list of test sets, select the name of the test set that reports a Status of **Validation Error** that you want to view. The names of the test sets are active links that take you to details regarding the test set.

1. The test set record displays validation error details at the top of the screen. Choose **View Details** to see the report on Validation Errors.

1. From the error report window, review the Line \$1 and Error Type to see where the error occurs. For a lengthy list of errors, you can choose to **Download** the error report.

1. Compare the errors listed in your test set input CSV file to your original test file to correct any issues and upload the test set again.

The following table lists the input CSV validation error messages with scenarios.


| Scenario | Error message | Notes | 
| --- | --- | --- | 
|  Test Set File Size Exceeds  |  Test Set file size is larger than 200 MB. Provide smaller file and try your request again.  |  | 
| Test set exceeds max records | Input file had records more than supported maximum number of 200,000. |  | 
| Upload Empty Test set | Imported test set is empty. Provide non-empty test set and try your request again. |  | 
| Empty column header name | Column Headers Row: found empty column name in column number 5. |  | 
| Unrecognized column header name | Column Headers Row: could not recognize column name 'dummy' in column number 2. |  | 
| Duplicate column header name | Column Headers Row: found multiple columns 'S3 audio link' and 'S3 audio link' that are same or equivalent. Remove or rename one of those columns. |  | 
| Multi value column name exceeded the limit | Column Headers Row: count of columns for 'Expected Output Slot' exceeded maximum supported count: 6. Remove some columns for 'Expected Output Slot' and try again. | Maximum Number of columns supported for multi value column is 6.   | 
| Text or Audio related column header not present | Could not find columns for text or audio conversations. For text conversations, use \$1'Text input'\$1 columns. For audio conversations, use \$1'S3 audio link', 'Expected transcription'\$1 columns. | Audio Mandatory Columns: \$1'S3 audio link', 'Expected transcription'\$1Text Mandatory Columns: \$1'Text input'\$1 | 
| Both Text and Audio related column header exist  | Found columns for both text and audio conversations. You can either use \$1'Text input'\$1 columns for text conversations, or \$1'S3 audio link', 'Expected transcription'\$1 columns for audio conversations. | Audio Mandatory Columns: \$1'S3 audio link', 'Expected transcription'\$1Text Mandatory Columns: \$1'Text input'\$1 | 
| Mandatory column is missing | Could not find mandatory columns ["Expected Output Intent"]. | Mandatory Columns:\$1"Line \$1", "Source", "Expected Output Intent"\$1 | 
| Found a data in column with no header | Found data in column number 8 for row number 6, but corresponding column did not have a column header. |  | 
| Data not found for mandatory columns | Row=12: no values found for mandatory columns: \$1"Source", "Expected Output Intent"\$1 |  | 
| Duplicate conversation id found | conversation number '19' was seen for previous conversation at row number 39." Make sure that same conversation number has not been provided for two conversations, you can do this by ensuring that all rows for a conversation number are grouped together. |  | 
| Invalid conversation id provided | Found invalid value 'test\$1conversation' in 'Conversation \$1' column. Value for this column must be either numeric or N/A (i.e. Not Applicable) for a user row. |  | 
| Non numeric value provided for line number | Found non-numeric value 'test\$1line' in 'Line \$1' column. Its value must be numeric. |  | 
| Conversation id not found in agent row | No value found for 'Conversation \$1' column. It must be provided for an agent row. |  | 
| Non numeric conversation id found in agent row | Found non-numeric value 'test\$1conversation' in 'Conversation \$1' column. Its value must be numeric for an agent row. |  | 
| Invalid S3 location | Invalid value 'bucket/folder' was provided. Valid format is S3://<bucketName>/<keyName>. |  | 
| Invalid S3 bucket name | Invalid s3 bucket name 'test\$1bucket' was provided. Check the bucket name. |  | 
| S3 audio location is folder | Provided audio location 'S3://bucket/folder' is invalid. It points to an S3 folder. |  | 
| Invalid intent name | Invalid characters were present in intent 'intent@name'. Check the intent name. | Regex check: ^([0-9a-zA-Z][\$1-]?)\$1\$1 | 
| Invalid slot name | Invalid characters were present in slot 'Slot@Name'. Check the slot name. | Regex: ^([0-9a-zA-Z][\$1-]?)\$1\$1It should not start or end with dot(.) | 
| Slot value provided for parent slot | Slot values were provided for subslot 'Address.City' as well as parent slot 'Address'. Values should be only provided for the subslot. | Parent slot in CST should not have slot value | 
| Invalid character in context name | Invalid characters were present in context name 'context@1'. Check the context name. | Regex: ^([A-Za-z]\$1?)\$1\$1 | 
| Invalid slot spelling style | Invalid value 'test' was provided. Make sure that they are all upper case. Valid values are ["Default", "SpellByLetter", "SpellByWord"]. | Supported values["Default", "SpellByLetter", "SpellByWord" | 
| Participant or source has to be either agent or User | Invalid value 'bot' was provided. Valid values are ["Agent", "User"]. | Supported Enums: "Agent", "User" | 
| Line Number should not be decimal | Invalid value '10.1' was provided. It should be a valid number without any fractions. |  | 
| Conversation Number should not be decimal | Invalid value '10.1' was provided. It should be a valid number without any fractions. |  | 
| Line number should be with in range | Invalid value '92233720368547758071' was provided. It should be greater than or equal to 1 and less than or equal to 9223372036854775807. |  | 
| Barge-in column only accepts boolean value | Invalid value 'test' was provided. It should be a valid boolean value such as 'true' or 'false'. Alternatively 'yes' and 'no' can be used. | Possible Values:"True", "true", "T", "Yes", "yEs", "Y", "1", "1.0", "False", "false", "F", "No", "no", "N", "0", "0.0" | 
| Expected slot, Session Attribute, Request Attribute should be separated by equal to (=) | Value 'slotName:slotValue' does not have '='. Such value should be provided as a key-value pair in format '<key>=<value>'. | For example: slotName = slotType | 
| Expected slot, Session Attribute, Request Attribute should be have key value pair | '=slotValue' does not have a key before '='. Such value should be provided as a key-value pair in format '<key>=<value>'. | For example: slotName = slotType | 
| Invalid quote at end  | Found incorrect quoting in 'Foo's item'“. It starts with quote character `"` but does not end with same quote character. | For example: `"Foo's item", KFC` | 
| Invalid quote at middle | Found incorrect quoting in `"Foo's" Burger, etc.`. It contains quote character `"` inside its content. Values containing single quotes should be wrapped within double quotes and vice-versa. | Correct For example: `"Foo's item", KFC` | 
| Required quotes | `key = Foo's item` contains single-quotes or double-quotes but has not been wrapped inside quotes. Values containing single quotes should be wrapped inside double quotes and vice-versa. |  | 
| Duplicate Key repeated in column  | Key `key1` was repeated in two columns: `Session Attribute 3` and `Session Attribute 1`. |  | 
| Invalid format in Runtime hint | Invalid key `BookFlight.Car."` provided for Runtime Hints. For Runtime Hints, key should be in format <intentName>.<slotName>. | If '.' must be present in middle of the key, intent name and slot name cannot be extracted from such key. examples of such incorrect formatting: "BookFlight", ".BookFlight.Car", "BookFlight.Car." | 
| Invalid Intent name in runtime hint key | Found invalid intent `intent@name` for Runtime Hints. Check intent name. | Regex check: ^([0-9a-zA-Z][\$1-]?)\$1\$1 | 
| Invalid Slot name in runtime hint key | Found invalid slot name in `Slot@Name` for Runtime Hints. Check slot name. | Regex: ^([0-9a-zA-Z][\$1-]?)\$1\$1It should not start or end with dot(.) | 

# Delete a test set in Test Workbench
<a name="delete-test-sets"></a>

You can easily delete a test set from your list of test sets.

**To delete a test set:**

1. Go to the list of **Test Sets** from the left side menu to see the list of test sets.

1. From the list of test sets, select the test set you want to delete.

1. Go to the **Actions** drop down menu in the top right, and choose **Delete**.

1. A message confirms that the test set is deleted.

# Edit test set details
<a name="edit-details-test-sets"></a>

You can edit a Test Set name and details in the list of test sets. The name or details can be added or updated later. However, you will have to update your test set before running the test with your bot or transcription data.

**To edit test set details:**

1. Go to the list of test set from the left side menu to see the list of test sets.

1. From the list of test sets, select the check box for the test set you want to edit.

1. Go to the **Actions** drop down menu in the top right, and choose **Edit Details**.

1. A message confirms that the test set is successfully edited.

# Update test set
<a name="update-test-sets"></a>

You can update, correct, modify, or delete items from the test set to optimize your baseline results, or to correct other errors that may have occurred in the test set

You can download a test set and fix the validation errors before uploading the corrected test set. See [View test validation errors](https://docs.aws.amazon.com/lexv2/latest/dg/view-errors-test-sets.html).

**To update a test set:**

1. From the test set record, choose the **Update Test Set** button in the top right.

1. Choose a file to upload from your Amazon S3 account or upload a CSV test file from your computer. NOTE: Updating a test set will overwrite the existing data.

1. Select the **Update** button.

1. A message confirms that the test set is successfully updated. NOTE: This operation can take a few minutes, depending on the complexity and size of the test set.

1. A message confirms that the test set is successfully updated and the **Status** displays **Ready for Testing**.

# Execute a test
<a name="execute-test-set"></a>

To execute a test set, you must choose the appropriate bot to run the test against the test set. You can choose a bot from your AWS account from the drop down menu under Test Set. This operation will test your selected bot against your validated test data to report performance metrics against the baseline data from the test set.

![\[The screen to execute a test in the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-extest.png)


**To execute a test in the Test Workbench**

1. In the test set record page, choose **Execute Test**.

1. Select the test set you want to use in the test.

1. Select the name of the bot to use in the test from the **Bot** drop down menu.

1. Choose a bot alias, if applicable, from the **Bot alias** drop down menu.

1. From the **Languages** selection, choose a version of English.

1. Select **Text** or **Audio** for the Modality type.

1. Choose your Amazon S3 location. (audio only)

1. Select your **Endpoint selection** for your bot. (streaming only)

1. Select the **Validate coverage** button to confirm your test in ready to run. If there are any errors present in the validation step, review the previous parameters and make corrections.

1. Select **Execute** to run the test.

1. A message confirms that the test is successfully executed.

# Test set coverage in Test Workbench
<a name="validation-test-set"></a>

Limited coverage of intents and slots between the test set and the bot can result in expected performance measures. We recommend that you review the test set coverage ahead of running the test.

![\[Review intents in the validation step with the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-discr1.png)


**To review validation coverage**

1. In the test set records, choose the **Validate coverage** button.

1. The message indicates it is validating coverage between the test set and the bot selected.

1. Once the operation is completed, the message indicates **Coverage validation successful**.

1. Choose the **View Details** button at the bottom of the window.

1. View the test set discrepancies for intents and slots by choosing the tab for each. You can download this data into a CSV format by choosing the **Download** button.

1. Review the validation results for your test set data, bot intents, and slots. Identify issues and make changes in your bot test set architecture to improve results. Upload the edited test set and bot to run the test once you have made changes to the CSV file. NOTE: Validation coverage runs against the test set and not against the bot. Intents in the bot but not present in the test set will not be covered.

# View test results
<a name="test-results-test-set"></a>

Interpret test results from the Test Workbench to determine where the conversation between your bot and the customer might be failing, or requiring the customer to make multiple attempts to fulfill the intent.

By locating these issues in your test results, you can optimize your bot’s performance by improving intent performance using different training data or utterances that are more consistent with the real time bot transcription values.

You can get a detailed view of intents and slots that had performance discrepancies. Once you have identified intents or slots that have discrepancies, you can further drill down and review the utterances and conversation flow.

![\[List of completed tests using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-testresults.png)


**To review test results:**

1. Go to the list of test sets from the left side menu to select the **Test results** option under Test workbench. NOTE: Test results indicate a **Status** of complete if they were successful.

1. Select the **Test Result ID** for the test results you want to review.

# Test results details in Test Workbench
<a name="test-results-details-test-set"></a>

The test results show the test set details, intents used, and the slots used. It also provides the overall test set input breakdown includes the overall results, conversation results, intent, and slot results.

Test results comprise all testing related information such as:
+ Test details metadata
+ Overall results
+ Conversation results
+ Intent and slot results
+ Detailed results

**Overall results tab:**

![\[The test set input breakdown chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results1.png)


**Test set input breakdown** – This chart shows the breakdown of number of conversations and single input utterances in the test set. 

![\[The single input breakdown chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results2.png)


**Single input breakdown** – Displays two charts that included end-to-end conversations and speech transcriptions. The number of passed and failed inputs are indicated on each chart. Note: Speech transcription chart will be visible only for the audio test set. 

![\[The conversation breakdown chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results3.png)


**Conversation breakdown** – Displays two charts that included end-to-end conversations and speech transcriptions. The number of passed and failed inputs are indicated on each chart. Note: Speech transcription chart will be visible only for the audio test set.

**Conversation results tab:**

![\[The conversation pass rates chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results4.png)


**Conversation pass rates** – The conversation pass rates table is used to see which intents and slots are used in each conversation in the test set. You can visualize where the conversation has failed by reviewing which intent or slot failed, along with the pass percentage of each intent and slot. 

![\[The conversation intent failure metrics chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results5.png)


**Conversation intent failure metrics** – This metric shows the top 5 worst performing intents in the test set. This panel shows a chart of what percent or number of intents were successful or failed based on the bot’s conversation logs or transcription. A successful intent does not mean that the entire conversation was successful. These metrics only apply to the value of the intents, regardless of which intent came before or after. 

![\[The Conversation slot failure metrics chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results6.png)


**Conversation slot failure metrics** – This metric shows the top 5 worst performing slots in the test set. Indicated the success rate for each slot in the intent. Bar graph shows both speech transcription and end-to-end conversations for each slot in the intent. 

**Intent and slot results tab:**

![\[The Intent recognition metrics chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results7.png)


**Intent recognition metrics** – Shows a table of how many intents were recognized successfully. Displays the pass rate of speech transcription and end-to-end conversations. 

![\[The Slot resolution metrics chart in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results8.png)


**Slot resolution metrics** – Shows the intents and slots separately, and the success and failure rate of each slot for each intent used in the conversation or single input. Displays the pass rate of speech transcription and end-to-end conversations. 

**Detailed results tab:**

![\[The detailed results in test results using the Test Workbench.\]](http://docs.aws.amazon.com/lexv2/latest/dg/images/testworkbench/test-workbench-results9.png)


**Detailed results** – Shows a detailed table on the conversation log with User and Agent utterances and the expected output and expected transcription for each slot. You can download this report by selecting the **Download** button. 

The following table lists the result failure error messages with scenarios.


| Scenario | Error message | Action | 
| --- | --- | --- | 
| Intent Mismatch | Expected BookFlight intent but it was BookHotel intent. | Skip other turns in the conversation | 
| Slot Elicitation mismatch | Expected departureDate slot to be elicited but it was cabinType. | Skip other turns in the conversation | 
| Slot value mismatch | Mismatch between expected and actual slot value. | Continue with other turns in the conversations | 
| Back-to-back agent prompt is missing | Expected bot to return an agent prompt in this turn but it was not received.  | Skip other turns in the conversation | 
| Transcription Mismatch | Expected transcription didn't match actual transcription. | Continue with other turns in the conversations | 
| Optional slot not elicited | Expected to elicit cabinType slot in next turn, however current intent fulfilled before that. | Skip other turns in the conversation | 
| Slot not recognized | Expected departureDate slot was not recognized in this turn. | Skip other turns in the conversation | 
| Extra back-to-back agent prompt | Expected a user turn but it was agent prompt | Skip other turns in the conversation | 