在 Amazon Bedrock 中创建知识库评估作业 - Amazon Bedrock

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

在 Amazon Bedrock 中创建知识库评估作业

您可以创建计算评估指标的知识库评估作业。

创建知识库评估作业需要一定的访问权限。有关更多信息,请参阅 创建 Amazon Bedrock 知识库评估作业所需的权限

注意

知识库评估作业处于预览模式,可能会发生变化。

您只能评估知识库的检索情况,也可以通过生成响应来评估检索情况。不同的指标仅与检索有关,与响应生成相关的检索。有关更多信息,请参阅 查看使用 LLMs (控制台)的知识库评估指标

您必须选择支持的评估者模型来计算评估指标。如果要使用响应生成来评估检索,则还必须为响应生成选择支持的模型。有关更多信息,请参阅 在 Amazon Bedrock 中创建知识库评估的先决条件

您必须提供要用于评估的提示数据集。有关更多信息,请参阅 在 Amazon Bedrock 中使用提示数据集进行知识库评估

以下示例向您展示了如何使用创建知识库评估作业 AWS CLI。

使用的知识库评估作业 LLMs

以下示例向您展示如何创建使用大型语言模型 (LLMs) 进行评估的知识库评估作业。

AWS Command Line Interface

aws bedrock create-evaluation-job \ --job-name "rag-evaluation-complete-stereotype-docs-app" \ --job-description "Evaluates Completeness and Stereotyping of RAG for docs application" \ --role-arn "arn:aws::iam:<region>:<account-id>:role/AmazonBedrock-KnowledgeBases" \ --evaluation-context "RAG" \ --evaluationConfig file://knowledge-base-evaluation-config.json \ --inference-config file://knowledge-base-evaluation-inference-config.json \ --output-data-config '{"s3Uri":"s3://docs/kbevalresults/"}' file://knowledge-base-evaluation-config.json { "automated": [{ "datasetMetricConfigs": [{ "taskType":"Generation", //Required field for model evaluation, but ignored/not used for knowledge base evaluation "metricNames":["Builtin.Completeness","Builtin.Stereotyping"], "dataset": [{ "name":"RagTestPrompts", "datasetLocation":"s3://docs/kbtestprompts.jsonl" }] }], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "anthropic.claude-3-5-sonnet-20240620-v1:0" }] } }] } file://knowledge-base-evaluation-inference-config.json { "ragConfigs": { "knowledgeBaseConfig": [{ "retrieveConfig": [{ "knowledgeBaseId":"<knowledge-base-id>", "knowledgeBaseRetrievalConfiguration": { "vectorSearchConfiguration": [{ "numberOfResults": 10, "overrideSearchType":"HYBRID" }] } }], "retrieveAndGenerateConfig": [{ "type":"KNOWLEDGE_BASE", "knowledgeBaseConfiguration": [{ "knowledgeBaseId":"<knowledge-base-id>", "modelArn":"arn:aws:bedrock:<region>:<account-id>:inference-profile/anthropic.claude-v2:1", "generationConfiguration": { "promptTemplate": { "textPromptTemplate": "\n\nHuman: I will provide you with a set of search results and a user's question. Your job is to answer the user's question using only information from the search results\n\nHere are the search results: $search_results$\n\nHere is the user's question: $query$\n\nAssistant:" } } }] }] }] } }

适用于 Python 的 SDK boto3

注意

在预览期间,您的 AWS 账户管理将为您提供一个参数文件供您下载和使用。

以下 python 示例演示了如何发出仅检索 boto3 API 请求。

import boto3 client = boto3.client('bedrock') job_request = client.create_evaluation_job( jobName="fkki-boto3-test1", jobDescription="two different task types", roleArn="arn:aws:iam::111122223333:role/service-role/Amazon-Bedrock-IAM-RoleAmazon-Bedrock-IAM-Role", evaluationContext="RAG", inferenceConfig={ "ragConfigs": [ { "knowledgeBaseConfig": { "retrieveConfig": { "knowledgeBaseId": "your-knowledge-base-id", "knowledgeBaseRetrievalConfiguration": { "vectorSearchConfiguration": { "numberOfResults": 10, "overrideSearchType": "HYBRID" } } } } } ] }, outputDataConfig={ "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/" }, evaluationConfig={ "automated": { "datasetMetricConfigs": [ { "taskType": "Summarization", "dataset": { "name": "RagDataset", "datasetLocation": { "s3Uri": "s3://amzn-s3-demo-bucket/input_data/data_3_rng.jsonl" } }, "metricNames": [ "Builtin.ContextCoverage" ] } ], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0" }] } } } ) print(job_request)

以下 python 示例演示了如何发出检索和生成 boto3 API 请求。

import boto3 client = boto3.client('bedrock') job_request = client.create_evaluation_job( jobName="api-auto-job-titan", jobDescription="two different task types", roleArn="arn:aws:iam::111122223333:role/role-name", inferenceConfig={ "ragConfigs": [ { "knowledgeBaseConfig": { "retrieveAndGenerateConfig": { "type": "KNOWLEDGE_BASE", "knowledgeBaseConfiguration": { "knowledgeBaseId": "73SPNQM4CI", "modelArn": "anthropic.claude-3-sonnet-20240229-v1:0", "generationConfiguration": { "promptTemplate": { "textPromptTemplate": "$search_results$ hello world template" } }, "retrievalConfiguration": { "vectorSearchConfiguration": { "numberOfResults": 10, "overrideSearchType": "HYBRID" } } } } } } ] }, outputDataConfig={ "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/" }, evaluationConfig={ "automated": { "datasetMetricConfigs": [ { "taskType": "Summarization", "dataset": { "name": "RagDataset", "datasetLocation": { "s3Uri": "s3://amzn-s3-demo-bucket-input-data/data_3_rng.jsonl" } }, "metricNames": [ "Builtin.Faithfulness" ] } ], "evaluatorModelConfig": { "bedrockEvaluatorModels": [{ "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0" }] } } } ) print(job_request)