You can use computed metrics to evaluate how effectively a Retrieval Augmented Generation (RAG) system retrieves relevant information from your data sources, and how effective the generated responses are in answering questions. The results of a RAG evaluation allow you to compare different Amazon Bedrock Knowledge Bases and other RAG sources, and then to choose the best Knowledge Base or RAG system for your application.
You can set up two different types of RAG evaluation jobs.
-
Retrieve only – In a retrieve-only RAG evaluation job, the report is based on the data retrieved from your RAG source. You can either evaluate an Amazon Bedrock Knowledge Base, or you can bring your own inference response data from an external RAG source.
-
Retrieve and generate – In a retrieve-and-generate RAG evaluation job, the report is based on the data retrieved from your knowledge base and the summaries generated by the evaluator model. You can either use an Amazon Bedrock Knowledge Base and evaluator model, or you can bring your own inference response data from an external RAG source.
Use the following topics to see how to create and manage knowledge base evaluation jobs, and the kinds of performance metrics you can use.