在 Amazon Bedrock 中创建知识库评估作业

您可以创建计算评估指标的知识库评估作业。

创建知识库评估作业需要一定的访问权限。有关更多信息，请参阅创建 Amazon Bedrock 知识库评估作业所需的权限。

注意

知识库评估作业处于预览模式，可能会发生变化。

您只能评估知识库的检索情况，也可以通过生成响应来评估检索情况。不同的指标仅与检索有关，与响应生成相关的检索。有关更多信息，请参阅查看使用 LLMs （控制台）的知识库评估指标

您必须选择支持的评估者模型来计算评估指标。如果要使用响应生成来评估检索，则还必须为响应生成选择支持的模型。有关更多信息，请参阅在 Amazon Bedrock 中创建知识库评估的先决条件

您必须提供要用于评估的提示数据集。有关更多信息，请参阅在 Amazon Bedrock 中使用提示数据集进行知识库评估

以下示例向您展示了如何使用创建知识库评估作业 AWS CLI。

使用的知识库评估作业 LLMs

以下示例向您展示如何创建使用大型语言模型 (LLMs) 进行评估的知识库评估作业。

AWS Command Line Interface


aws bedrock create-evaluation-job \
 --job-name "rag-evaluation-complete-stereotype-docs-app" \
 --job-description "Evaluates Completeness and Stereotyping of RAG for docs application" \
 --role-arn "arn:aws::iam:<region>:<account-id>:role/AmazonBedrock-KnowledgeBases" \
 --evaluation-context "RAG" \
 --evaluationConfig file://knowledge-base-evaluation-config.json \
 --inference-config file://knowledge-base-evaluation-inference-config.json \
 --output-data-config '{"s3Uri":"s3://docs/kbevalresults/"}' 

file://knowledge-base-evaluation-config.json

{
    "automated": [{
        "datasetMetricConfigs": [{
            "taskType":"Generation", //Required field for model evaluation, but ignored/not used for knowledge base evaluation
            "metricNames":["Builtin.Completeness","Builtin.Stereotyping"],
            "dataset": [{
                "name":"RagTestPrompts",
                "datasetLocation":"s3://docs/kbtestprompts.jsonl"
            }]
        }],
        "evaluatorModelConfig": {
            "bedrockEvaluatorModels": [{
                "modelIdentifier": "anthropic.claude-3-5-sonnet-20240620-v1:0"
            }]
        }
    }]
}
 
file://knowledge-base-evaluation-inference-config.json

{
    "ragConfigs": {
        "knowledgeBaseConfig": [{
            "retrieveConfig": [{
                "knowledgeBaseId":"<knowledge-base-id>",
                "knowledgeBaseRetrievalConfiguration": {
                    "vectorSearchConfiguration": [{
                        "numberOfResults": 10,
                        "overrideSearchType":"HYBRID"
                    }]
                }
            }],
            "retrieveAndGenerateConfig": [{
                "type":"KNOWLEDGE_BASE",
                "knowledgeBaseConfiguration": [{
                    "knowledgeBaseId":"<knowledge-base-id>",
                    "modelArn":"arn:aws:bedrock:<region>:<account-id>:inference-profile/anthropic.claude-v2:1",
                    "generationConfiguration": {
                        "promptTemplate": {
                            "textPromptTemplate": "\n\nHuman: I will provide you with a set of search results and a user's question. Your job is to answer the user's question using only information from the search results\n\nHere are the search results: $search_results$\n\nHere is the user's question: $query$\n\nAssistant:"
                        }
                        
                    }
                }]
            }]
        }]
    }
}

适用于 Python 的 SDK boto3

注意

在预览期间，您的 AWS 账户管理将为您提供一个参数文件供您下载和使用。

以下 python 示例演示了如何发出仅检索 boto3 API 请求。


import boto3
client = boto3.client('bedrock')

job_request = client.create_evaluation_job(
    jobName="fkki-boto3-test1",
    jobDescription="two different task types",
    roleArn="arn:aws:iam::111122223333:role/service-role/Amazon-Bedrock-IAM-RoleAmazon-Bedrock-IAM-Role",
    evaluationContext="RAG",
    inferenceConfig={
        "ragConfigs": [
            {
                "knowledgeBaseConfig": {
                    "retrieveConfig": {
                        "knowledgeBaseId": "your-knowledge-base-id",
                        "knowledgeBaseRetrievalConfiguration": {
                            "vectorSearchConfiguration": {
                                "numberOfResults": 10,
                                "overrideSearchType": "HYBRID"
                            }
                        }
                    }
                }
            }
        ]
    },
    outputDataConfig={
        "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/"
    },
    evaluationConfig={
        "automated": {
            "datasetMetricConfigs": [
                {
                    "taskType": "Summarization",
                    "dataset": {
                        "name": "RagDataset",
                        "datasetLocation": {
                            "s3Uri": "s3://amzn-s3-demo-bucket/input_data/data_3_rng.jsonl"
                        }
                    },
                    "metricNames": [
                        "Builtin.ContextCoverage"
                    ]
                }
            ],
            "evaluatorModelConfig":
                {
                    "bedrockEvaluatorModels": [{
                        "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0"
                    }]
                }
        }
    }
)

print(job_request)

以下 python 示例演示了如何发出检索和生成 boto3 API 请求。


import boto3
client = boto3.client('bedrock')

job_request = client.create_evaluation_job(
    jobName="api-auto-job-titan",
    jobDescription="two different task types",
    roleArn="arn:aws:iam::111122223333:role/role-name",
    inferenceConfig={
        "ragConfigs": [
            {
                "knowledgeBaseConfig": {
                    "retrieveAndGenerateConfig": {
                        "type": "KNOWLEDGE_BASE",
                        "knowledgeBaseConfiguration": {
                            "knowledgeBaseId": "73SPNQM4CI",
                            "modelArn": "anthropic.claude-3-sonnet-20240229-v1:0",
                            "generationConfiguration": {
                                "promptTemplate": {
                                    "textPromptTemplate": "$search_results$ hello world template"
                                }
                            },
                             "retrievalConfiguration": {
                                 "vectorSearchConfiguration": {
                                     "numberOfResults": 10,
                                     "overrideSearchType": "HYBRID"
                                }
                             }
                        }
                    }
                }
            }
        ]
    },
    outputDataConfig={
        "s3Uri":"s3://amzn-s3-demo-bucket-model-evaluations/outputs/"
    },
    evaluationConfig={
        "automated": {
            "datasetMetricConfigs": [
                {
                    "taskType": "Summarization",
                    "dataset": {
                        "name": "RagDataset",
                        "datasetLocation": {
                            "s3Uri": "s3://amzn-s3-demo-bucket-input-data/data_3_rng.jsonl"
                        }
                    },
                    "metricNames": [
                        "Builtin.Faithfulness"
                    ]
                }
            ],
            "evaluatorModelConfig":
                {
                    "bedrockEvaluatorModels": [{
                        "modelIdentifier": "meta.llama3-1-70b-instruct-v1:0"
                    }]
                }
        }
    }
)

print(job_request)

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

Mistral Large 1 (24.02)

列出作业