在 Amazon 基岩中使用自訂提示資料集進行模型評估

您可以在模型評估任務中使用自訂提示資料集。

自訂提示資料集必須存放在 Amazon S3 中，並使用JSON行格式並使用副.jsonl檔名。將資料集上傳到 Amazon S3 時，請務必更新 S3 儲存貯體上的跨來源資源共用 (CORS) 組態。若要進一步瞭解所需CORS權限，請參閱S3 儲存貯體上所需的跨源資源共用 (CORS) 權限。

主題

自動模型評估任務中使用的自訂提示資料集的需求
使用人力模型評估任務中自訂提示資料集的需求

自動模型評估任務中使用的自訂提示資料集的需求

在自動模型評估任務中，您可以針對在模型評估任務中選取的每個指標使用自訂提示資料集。自訂資料集使用JSON行格式 (.jsonl)，且每一行都必須是有效的JSON物件。每個自動評估任務在您的資料集中最多可有 1000 個提示。

您必須在自訂資料集中使用下列索引鍵。

prompt – 指示下列任務的輸入所需：
- 在一般文字產生中，您的模型應該回應的提示。
- 您的模型應該在問答任務類型中回答的問題。
- 您的模型應該在文字摘要任務總結的文字。
- 您的模型應在分類任務中分類的文字。
referenceResponse – 必須指出針對下列任務類型評估模型回應的基本事實：
- 問答任務中所有提示的答案。
- 所有準確性和強健性評估的答案。
category – (選用) 產生針對每個類別報告的評估分數。

例如，準確性需要提出問題和檢查模型回應的答案。在此範例中，使用索引鍵 prompt 與問題中包含的值，以及包含在答案中的值的索引鍵 referenceResponse，如下所示。


{
    "prompt": "Bobigny is the capital of",
    "referenceResponse": "Seine-Saint-Denis",
    "category": "Capitals"
}

上一個範例是將作為推論請求傳送至模型的單JSON行輸入檔案。模型將在JSON線數據集中的每一個這樣的記錄被調用。下列資料輸入範例適用於使用選擇性 category 索引鍵進行評估的問答任務。


{"prompt":"Aurillac is the capital of", "category":"Capitals", "referenceResponse":"Cantal"}
{"prompt":"Bamiyan city is the capital of", "category":"Capitals", "referenceResponse":"Bamiyan Province"}
{"prompt":"Sokhumi is the capital of", "category":"Capitals", "referenceResponse":"Abkhazia"}

若要深入了解使用人力的模型評估任務的格式需求，請參閱使用人力模型評估任務中自訂提示資料集的需求。

使用人力模型評估任務中自訂提示資料集的需求

在JSON線條格式中，每一行都是有效的JSON物件。在每個模型評估任務中，提示資料集最多可有 1000 個提示。

有效的提示項目必須包含prompt金鑰。category和都referenceResponse是可選的。在模型評估報告卡中檢閱結果時，可使用 category 鍵標示具特定類別的提示來篩選結果。使用 referenceResponse 鍵來指定您的員工可以在評估期間參考的基本事實回應。

在工作者 UI 中，您的人力工作者可看見您為 prompt 和 referenceResponse 指定的內容。

以下是包含 6 個輸入並使用JSON行格式的自訂資料集範例。


{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}
{"prompt":"Provide the prompt you want the model to use during inference","category":"(Optional) Specify an optional category","referenceResponse":"(Optional) Specify a ground truth response."}

下列範例是為了加強清晰度而擴充的單一項目


{
    "prompt": "What is high intensity interval training?",
    "category": "Fitness",
    "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods."
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

內建提示資料集

用於模型評估的報告和指標