Creating a knowledge base evaluation job in Amazon Bedrock
You can create a knowledge base evaluation job that computes metrics for the evaluation.
Certain access permissions are required to create knowledge base evaluation jobs. For more information, see Required permissions to create a Amazon Bedrock Knowledge Bases evaluation job.
Note
Knowledge base evaluation jobs are in preview mode and are subject to change.
You can evaluate retrieval only of your knowledge base or retrieval with response generation. Different metrics are relevant to retrieval only and retrieval with response generation. For more information, see Review metrics for knowledge base evaluations that use LLMs (console)
You must choose a supported evaluator model to compute the metrics for your evaluation. If you want to evaluate retrieval with response generation, then you must also choose a supported model for response generation. For more information, see Prerequisites for creating knowledge base evaluations in Amazon Bedrock
You must provide a prompt dataset you want to use for the evaluation. For more information, see Use a prompt dataset for a knowledge base evaluation in Amazon Bedrock
The following example shows you how to create a knowledge base evaluation job using the AWS CLI.
Knowledge base evaluation jobs that use LLMs
The following example shows you how to create a knowledge base evaluation job that uses Large Language Models (LLMs) for the evaluation.
AWS Command Line Interface
aws bedrock create-evaluation-job \ --job-name "rag-evaluation-complete-stereotype-docs-app" \ --job-description "Evaluates Completeness and Stereotyping of RAG for docs application" \ --role-arn "arn:aws::iam:<region>:<account-id>:role/AmazonBedrock-KnowledgeBases" \ --evaluation-context "RAG" \ --evaluationConfig file://knowledge-base-evaluation-config.json \ --inference-config file://knowledge-base-evaluation-inference-config.json \ --output-data-config '{"s3Uri":"s3://docs/kbevalresults/"}' file://knowledge-base-evaluation-config.json { "automated": [{ "datasetMetricConfigs": [{ "taskType":"Generation", //Required field for model evaluation, but ignored/not used for knowledge base evaluation "metricNames":["Builtin.Completeness","Builtin.Stereotyping"], "dataset": [{ "name":"RagTestPrompts", "datasetLocation":"s3://docs/kbtestprompts.jsonl" }] }], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "anthropic.claude-3-5-sonnet-20240620-v1:0" }] } }] } file://knowledge-base-evaluation-inference-config.json { "ragConfigs": { "knowledgeBaseConfig": [{ "retrieveConfig": [{ "knowledgeBaseId":"<knowledge-base-id>", "knowledgeBaseRetrievalConfiguration": { "vectorSearchConfiguration": [{ "numberOfResults": 10, "overrideSearchType":"HYBRID" }] } }], "retrieveAndGenerateConfig": [{ "type":"KNOWLEDGE_BASE", "knowledgeBaseConfiguration": [{ "knowledgeBaseId":"<knowledge-base-id>", "modelArn":"arn:aws:bedrock:<region>:<account-id>:inference-profile/anthropic.claude-v2:1", "generationConfiguration": { "promptTemplate": { "textPromptTemplate": "\n\nHuman: I will provide you with a set of search results and a user's question. Your job is to answer the user's question using only information from the search results\n\nHere are the search results: $search_results$\n\nHere is the user's question: $query$\n\nAssistant:" } } }] }] }] } }
SDK for Python boto3
Note
During preview, your AWS account management will provide you with a parameters file to download and use.
The following python example demonstrates how to make a Retrieve only boto3 API request.
import boto3 client = boto3.client('bedrock') job_request = client.create_evaluation_job( jobName="fkki-boto3-test1", jobDescription="two different task types", roleArn="arn:aws:iam::
111122223333
:role/service-role/Amazon-Bedrock-IAM-Role
Amazon-Bedrock-IAM-Role", evaluationContext="RAG", inferenceConfig={ "ragConfigs": [ { "knowledgeBaseConfig": { "retrieveConfig": { "knowledgeBaseId": "your-knowledge-base-id
", "knowledgeBaseRetrievalConfiguration": { "vectorSearchConfiguration": { "numberOfResults": 10, "overrideSearchType": "HYBRID" } } } } } ] }, outputDataConfig={ "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/" }, evaluationConfig={ "automated": { "datasetMetricConfigs": [ { "taskType": "Summarization", "dataset": { "name": "RagDataset", "datasetLocation": { "s3Uri": "s3://amzn-s3-demo-bucket/input_data/data_3_rng.jsonl" } }, "metricNames": [ "Builtin.ContextCoverage" ] } ], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0" }] } } } ) print(job_request)
The following python example demonstrates how to make a Retrieve and generate boto3 API request.
import boto3 client = boto3.client('bedrock') job_request = client.create_evaluation_job( jobName="api-auto-job-titan", jobDescription="two different task types", roleArn="arn:aws:iam::
111122223333
:role/role-name", inferenceConfig={ "ragConfigs": [ { "knowledgeBaseConfig": { "retrieveAndGenerateConfig": { "type": "KNOWLEDGE_BASE", "knowledgeBaseConfiguration": { "knowledgeBaseId": "73SPNQM4CI", "modelArn": "anthropic.claude-3-sonnet-20240229-v1:0", "generationConfiguration": { "promptTemplate": { "textPromptTemplate": "$search_results$ hello world template" } }, "retrievalConfiguration": { "vectorSearchConfiguration": { "numberOfResults": 10, "overrideSearchType": "HYBRID" } } } } } } ] }, outputDataConfig={ "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/" }, evaluationConfig={ "automated": { "datasetMetricConfigs": [ { "taskType": "Summarization", "dataset": { "name": "RagDataset", "datasetLocation": { "s3Uri": "s3://amzn-s3-demo-bucket-input-data/data_3_rng.jsonl" } }, "metricNames": [ "Builtin.Faithfulness" ] } ], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0" }] } } } ) print(job_request)