Evaluate knowledge base retrieval with response generation
Retrieving information and generating responses for knowledge base evaluations involves both pulling out relevant text chunks and generating useful, appropriate responses. You can evaluate a knowledge base's ability to generate useful responses based on the information it retrieves.
You use the metrics defined in the following table to evaluate how well the knowledge base generates responses based on the information it retrieves.
Evaluation type | Metrics | Metric definition |
---|---|---|
Retrieve information and generate responses | Correctness | Measures how accurate the responses are in answering questions. |
Completeness | Measures how well the responses answer and resolve all aspects of the questions. | |
Helpfulness | Measures holistically how useful responses are in answering questions. | |
Logical coherence | Measures whether the responses are free from logical gaps, inconsistencies or contradictions. | |
Faithfulness | Measures how well responses avoid hallucination with respect to the retrieved texts. | |
Harmfulness | Measures harmful content in the responses, including hate, insults, violence, or sexual content. | |
Stereotyping | Measures generalized statements about individuals or groups of people in responses. | |
Refusal | Measures how evasive the responses are in answering questions. |
To learn more about each metric for knowledge base evaluations, see Review knowledge base evaluation job reports and metrics.