了解人工評估任務的結果

當您建立使用人力工作者的模型評估任務時，您選擇了一或多個指標類型。當工作團隊的成員評估工作者入口網站中的回應時，其回應會儲存在 humanAnswers json 物件中。這些回應的儲存方式會根據建立任務時選取的指標類型而變更。

下列各節說明這些差異，並提供範例。

JSON 輸出參考

當模型評估任務完成時，結果會以 JSON 檔案儲存在 Amazon S3 中。JSON 物件包含三個高階節點 humanEvaluationResult、 inputRecord和 modelResponses。humanEvaluationResult金鑰是高階節點，其中包含指派給模型評估任務之工作團隊的回應。inputRecord 金鑰是高階節點，其中包含建立模型評估任務時提供給模型的提示。modelResponses 金鑰是高階節點，其中包含來自（模型）提示的回應。

下表摘要說明在模型評估任務的 JSON 輸出中找到的鍵值對。

程序區段提供有關每個索引鍵值對的更詳細詳細資訊。

參數	範例	描述
`flowDefinitionArn`	`arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name`	建立人工循環之人工審核工作流程（流程定義）的 ARN。
`humanAnswers`	所選評估指標特有的 JSON 物件清單。若要進一步了解，請參閱在下找到的鍵值對 humanAnswers。	包含工作者回應的 JSON 物件清單。
`humanLoopName`	`system-generated-hash`	系統產生了 40 個字元的十六進位字串。
`inputRecord`	`"inputRecord": { "prompt": { "text": "Who invented the airplane?" }, "category": "Airplanes", "referenceResponse": { "text": "Orville and Wilbur Wright" }, "responses": [{ "modelIdentifier": "meta-textgeneration-llama-codellama-7b", "text": "The Wright brothers, Orville and Wilbur Wright are widely credited with inventing and manufacturing the world's first successful airplane." }] }`	包含來自輸入資料集中輸入提示的 JSON 物件。
`modelResponses`	`"modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/model-id", "text": "the-models-response-to-the-prompt" }]`	來自模型的個別回應。
`inputContent`	`{ "additionalDataS3Uri":"s3://user-specified-S3-URI-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"brief-name", "metricName":"metric-name", "metricType":"IndividualLikertScale" } ], "instructions":"example instructions" }`	在您的 Amazon S3 儲存貯體中啟動人工迴圈所需的人工迴圈輸入內容。
`modelResponseIdMap`	`{ "0": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612", "1": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352" }`	描述中每個模型的呈現方式`answerContent`。

在下找到的鍵值對 `humanEvaluationResult`

下列索引鍵值會圍繞在模型評估任務輸出humanEvaluationResult的下找到。

如需與相關聯的鍵值對humanAnswers，請參閱在下找到的鍵值對 humanAnswers。

flowDefinitionArn

用於完成模型評估任務之流程定義的 ARN。
範例：arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name

humanLoopName

系統產生了 40 個字元的十六進位字串。

inputContent

此索引鍵值說明指標類型，以及您在工作者入口網站中為工作者提供的指示。
- additionalDataS3Uri：Amazon S3 中儲存工作者指示的位置。
- instructions：您在工作者入口網站中提供給工作者的指示。
- evaluationMetrics：指標的名稱及其描述。金鑰值metricType是提供給工作者的工具，用於評估模型的回應。

modelResponseIdMap

此鍵值對會識別所選模型的完整名稱，以及如何將工作者選擇對應至humanAnswers鍵值對中的模型。

在下找到的鍵值對 `inputRecord`

下列項目說明inputRecord索引鍵值對。

prompt

傳送至模型的提示文字。

category

可分類提示的選用類別。在模型評估期間，工作者可在工作者入口網站中看見。
範例："American cities"

referenceResponse

輸入 JSON 中的選用欄位，用於指定您希望工作者在評估期間參考的地面實況

responses

輸入 JSON 中的選用欄位，其中包含其他模型的回應。

JSON 輸入記錄範例。


{
  "prompt": {
     "text": "Who invented the airplane?"
  },
  "category": "Airplanes",
  "referenceResponse": {
    "text": "Orville and Wilbur Wright"
  },
  "responses":
    // The same modelIdentifier must be specified for all responses
    [{
      "modelIdentifier": "meta-textgeneration-llama-codellama-7b" ,
      "text": "The Wright brothers, Orville and Wilbur Wright are widely credited with inventing and manufacturing the world's first successful airplane."
    }]
}

在下找到的鍵值對 `modelResponses`

金鑰值對陣列，其中包含來自模型的回應，以及哪個模型提供回應。

text

模型對提示的回應。

modelIdentifier

模型的名稱。

在下找到的鍵值對 `humanAnswers`

包含模型回應的索引鍵值對陣列，以及工作者如何評估模型。

acceptanceTime

當工作者在工作者入口網站中接受任務時。

submissionTime

當工作者提交其回應時。

timeSpentInSeconds

工作者完成任務所花費的時間。

workerId

完成任務的工作者 ID。

workerMetadata

指派給此模型評估任務之工作團隊的中繼資料。

`answerContent` JSON 陣列的格式

答案的結構取決於建立模型評估任務時選取的評估指標。每個工作者回應或答案都會記錄在新的 JSON 物件中。

answerContent

evaluationResults 包含工作者的回應。
- 選取選擇按鈕時，每個工作者的結果會是 "evaluationResults": "comparisonChoice"。
  
  metricName：指標的名稱
  
  result：JSON 物件指出使用 0或選取的工作者模型1。若要查看模型映射到哪個值，請modelResponseIdMap。
- 當 Likert 擴展時，選取比較時，每個工作者的結果會是 "evaluationResults": "comparisonLikertScale"。
  
  metricName：指標的名稱。
  
  leftModelResponseId：指出modelResponseIdMap在工作者入口網站左側顯示的項目。
  
  rightModelResponseId：指出modelResponseIdMap在工作者入口網站左側顯示的項目。
  
  result：JSON 物件指出使用 0或選取的工作者模型1。若要查看模型映射到哪個值， modelResponseIdMap
- 選取順序排名時，每個工作者的結果會是 "evaluationResults": "comparisonRank"。
  
  metricName：指標的名稱
  
  result：JSON 物件的陣列。針對每個模型 (modelResponseIdMap) 工作者提供 rank。
```
"result": [{
	"modelResponseId": "0",
	"rank": 1
}, {
	"modelResponseId": "1",
	"rank": 1
}]
```
- 當 Likert 擴展時，選取單一模型回應的評估，工作者的結果會儲存在中"evaluationResults": "individualLikertScale"。這是 JSON 陣列，其中包含建立任務時metricName指定的分數。
  
  metricName：指標的名稱。
  
  modelResponseId：評分的模型。若要查看模型映射到哪個值，請modelResponseIdMap。
  
  result：金鑰值對，指出工作者選取的 likert 縮放值。
- 選取向上/向下拇指時，工作者的結果會儲存為 JSON 陣列 "evaluationResults": "thumbsUpDown"。
  
  metricName：指標的名稱。
  
  result：或與 false相關的 true或 metricName。當工作者選擇拇指時，"result" : true。

模型評估任務輸出的範例輸出

下列 JSON 物件是儲存在 Amazon S3 中的範例模型評估任務輸出。若要進一步了解每個索引鍵值對，請參閱 JSON 輸出參考。

為了清楚起見，此任務只包含兩個工作者的回應。某些索引鍵值對可能也已被截斷以保持可讀性


{
	"humanEvaluationResult": {
		"flowDefinitionArn": "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name",
        "humanAnswers": [
            {
                "acceptanceTime": "2024-06-07T22:31:57.066Z",
                "answerContent": {
                    "evaluationResults": {
                        "comparisonChoice": [
                            {
                                "metricName": "Fluency",
                                "result": {
                                    "modelResponseId": "0"
                                }
                            }
                        ],
                        "comparisonLikertScale": [
                            {
                                "leftModelResponseId": "0",
                                "metricName": "Coherence",
                                "result": 1,
                                "rightModelResponseId": "1"
                            }
                        ],
                        "comparisonRank": [
                            {
                                "metricName": "Toxicity",
                                "result": [
                                    {
                                        "modelResponseId": "0",
                                        "rank": 1
                                    },
                                    {
                                        "modelResponseId": "1",
                                        "rank": 1
                                    }
                                ]
                            }
                        ],
                        "individualLikertScale": [
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "0",
                                "result": 2
                            },
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "1",
                                "result": 3
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "0",
                                "result": 1
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "1",
                                "result": 4
                            }
                        ],
                        "thumbsUpDown": [
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "0",
                                "result": true
                            },
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "1",
                                "result": true
                            }
                        ]
                    }
                },
                "submissionTime": "2024-06-07T22:32:19.640Z",
                "timeSpentInSeconds": 22.574,
                "workerId": "ead1ba56c1278175",
                "workerMetadata": {
                    "identityData": {
                        "identityProviderType": "Cognito",
                        "issuer": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_WxGLvNMy4",
                        "sub": "cd2848f5-6105-4f72-b44e-68f9cb79ba07"
                    }
                }
            },
            {
                "acceptanceTime": "2024-06-07T22:32:19.721Z",
                "answerContent": {
                    "evaluationResults": {
                        "comparisonChoice": [
                            {
                                "metricName": "Fluency",
                                "result": {
                                    "modelResponseId": "1"
                                }
                            }
                        ],
                        "comparisonLikertScale": [
                            {
                                "leftModelResponseId": "0",
                                "metricName": "Coherence",
                                "result": 1,
                                "rightModelResponseId": "1"
                            }
                        ],
                        "comparisonRank": [
                            {
                                "metricName": "Toxicity",
                                "result": [
                                    {
                                        "modelResponseId": "0",
                                        "rank": 2
                                    },
                                    {
                                        "modelResponseId": "1",
                                        "rank": 1
                                    }
                                ]
                            }
                        ],
                        "individualLikertScale": [
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "0",
                                "result": 3
                            },
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "1",
                                "result": 4
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "0",
                                "result": 1
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "1",
                                "result": 5
                            }
                        ],
                        "thumbsUpDown": [
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "0",
                                "result": true
                            },
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "1",
                                "result": false
                            }
                        ]
                    }
                },
                "submissionTime": "2024-06-07T22:32:57.918Z",
                "timeSpentInSeconds": 38.197,
                "workerId": "bad258db224c3db6",
                "workerMetadata": {
                    "identityData": {
                        "identityProviderType": "Cognito",
                        "issuer": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_WxGLvNMy4",
                        "sub": "84d5194a-3eed-4ecc-926d-4b9e1b724094"
                    }
                }
            }
        ],
        "humanLoopName": "a757 11d3e75a 8d41f35b9873d 253f5b7bce0256e",
        "inputContent": {
            "additionalDataS3Uri": "s3://mgrt-test-us-west-2/test-2-workers-2-model/datasets/custom_dataset/0/task-input-additional-data.json",
            "instructions": "worker instructions provided by the model evaluation job administrator",
            "evaluationMetrics": [
                {
                    "metricName": "Fluency",
                    "metricType": "ComparisonChoice",
                    "description": "Measures the linguistic quality of a generated text."
                },
                {
                    "metricName": "Coherence",
                    "metricType": "ComparisonLikertScale",
                    "description": "Measures the organization and structure of a generated text."
                },
                {
                    "metricName": "Toxicity",
                    "metricType": "ComparisonRank",
                    "description": "Measures the harmfulness of a generated text."
                },
                {
                    "metricName": "Accuracy",
                    "metricType": "ThumbsUpDown",
                    "description": "Indicates the accuracy of a generated text."
                },
                {
                    "metricName": "Correctness",
                    "metricType": "IndividualLikertScale",
                    "description": "Measures a generated answer's satisfaction in the context of the question."
                },
                {
                    "metricName": "Completeness",
                    "metricType": "IndividualLikertScale",
                    "description": "Measures a generated answer's inclusion of all relevant information."
                }
            ],
            "disableRandomization": "true"
        },
        "modelResponseIdMap": {
            "0": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612",
            "1": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352"
        }
    },
    "inputRecord": {
        "prompt": {
            "text": "What is high intensity interval training?"
        },
        "category": "Fitness",
        "referenceResponse": {
            "text": "High-Intensity Interval Training (HIIT)"
        }
    },
    "modelResponses": [
        {
            "text": "High Intensity Interval Training (HIIT) is a form of exercise that alternates between periods of high intensity work and low intensity recovery.HIIT is an excellent way to increase your fitness and improve your health, but it can be difficult to get started.In this article, we will",
            "modelIdentifier": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612"
        },
        {
            "text": "High intensity interval training is a form of exercise consisting of short bursts of maximum effort followed by periods of rest. The theory behind HIIT is that it can often be more effective at improving cardiovascular and metabolic health than longer, lower intensity workouts.The work intervals can range in length depending on the specific type of exercise, but are typically between 20 and 90 seconds. The recovery periods are generally longer, lasting between 1 and 5 minutes. This pattern is then repeated for multiple sets.\n\nSince the work intervals are high intensity, they require more effort from your body and therefore result in a greater calorie burn. The body also continues to burn calories at an increased rate after the workout due to an effect called excess post exercise oxygen consumption (EPOC), also know as the afterburn effect.\n\nHIIT is a versatile form of training that can be adapted to different fitness levels and can be performed using a variety of exercises including cycling, running, bodyweight movements, and even swimming. It can be done in as little as 20 minutes once or twice a week, making it an efficient option for busy individuals.\n\nWhat are the benefits of high intensity interval training",
            "modelIdentifier": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352"
        }
    ]
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

任務結果

了解自動評估任務的結果

了解人工評估任務的結果

JSON 輸出參考

在 下找到的鍵值對 humanEvaluationResult

在 下找到的鍵值對 inputRecord

在 下找到的鍵值對 modelResponses

在 下找到的鍵值對 humanAnswers

answerContent JSON 陣列的格式