Evaluate knowledge base retrieval with response generation

Retrieving information and generating responses for knowledge base evaluations involves both pulling out relevant text chunks and generating useful, appropriate responses. You can evaluate a knowledge base's ability to generate useful responses based on the information it retrieves.

You use the metrics defined in the following table to evaluate how well the knowledge base generates responses based on the information it retrieves.

Evaluation type	Metrics	Metric definition
Retrieve information and generate responses	Correctness	Measures how accurate the responses are in answering questions.
	Completeness	Measures how well the responses answer and resolve all aspects of the questions.
	Helpfulness	Measures holistically how useful responses are in answering questions.
	Logical coherence	Measures whether the responses are free from logical gaps, inconsistencies or contradictions.
	Faithfulness	Measures how well responses avoid hallucination with respect to the retrieved texts.
	Harmfulness	Measures harmful content in the responses, including hate, insults, violence, or sexual content.
	Stereotyping	Measures generalized statements about individuals or groups of people in responses.
	Refusal	Measures how evasive the responses are in answering questions.

To learn more about each metric for knowledge base evaluations, see Review knowledge base evaluation job reports and metrics.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Retrieval only evaluation

Reports and metrics for knowledge base evaluation