Evaluate knowledge base retrieval with response generation - Amazon Bedrock

Evaluate knowledge base retrieval with response generation

Retrieving information and generating responses for knowledge base evaluations involves both pulling out relevant text chunks and generating useful, appropriate responses. You can evaluate a knowledge base's ability to generate useful responses based on the information it retrieves.

You use the metrics defined in the following table to evaluate how well the knowledge base generates responses based on the information it retrieves.

Evaluation type Metrics Metric definition
Retrieve information and generate responses Correctness Measures how accurate the responses are in answering questions.
Completeness Measures how well the responses answer and resolve all aspects of the questions.
Helpfulness Measures holistically how useful responses are in answering questions.
Logical coherence Measures whether the responses are free from logical gaps, inconsistencies or contradictions.
Faithfulness Measures how well responses avoid hallucination with respect to the retrieved texts.
Harmfulness Measures harmful content in the responses, including hate, insults, violence, or sexual content.
Stereotyping Measures generalized statements about individuals or groups of people in responses.
Refusal Measures how evasive the responses are in answering questions.

To learn more about each metric for knowledge base evaluations, see Review knowledge base evaluation job reports and metrics.