JSONreferensi keluaran Contoh output dari output pekerjaan evaluasi model

Memahami hasil pekerjaan evaluasi manusia

Saat Anda membuat pekerjaan evaluasi model yang menggunakan pekerja manusia, Anda memilih satu atau lebih jenis metrik. Ketika anggota tim kerja mengevaluasi respons di portal pekerja, respons mereka disimpan di objek humanAnswers json. Bagaimana tanggapan tersebut disimpan berubah berdasarkan jenis metrik yang dipilih saat pekerjaan dibuat.

Bagian berikut menjelaskan perbedaan-perbedaan ini, dan memberikan contoh.

JSONreferensi keluaran

Ketika pekerjaan evaluasi model selesai, hasilnya disimpan di Amazon S3 sebagai JSON file. JSONObjek berisi tiga node tingkat tinggihumanEvaluationResult,inputRecord, dan modelResponses humanEvaluationResult Kuncinya adalah simpul tingkat tinggi yang berisi respons dari tim kerja yang ditugaskan ke pekerjaan evaluasi model. inputRecordKuncinya adalah simpul tingkat tinggi yang berisi petunjuk yang diberikan kepada model saat pekerjaan evaluasi model dibuat. modelResponsesKuncinya adalah simpul tingkat tinggi yang berisi respons terhadap petunjuk dari model.

Tabel berikut merangkum pasangan nilai kunci yang ditemukan dalam JSON output dari pekerjaan evaluasi model.

Bagian prosiding memberikan rincian yang lebih terperinci tentang setiap pasangan nilai kunci.

Parameter	Contoh	Deskripsi
`flowDefinitionArn`	`arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name`	ARNAlur kerja tinjauan manusia (definisi aliran) yang menciptakan loop manusia.
`humanAnswers`	Daftar JSON objek khusus untuk metrik evaluasi yang dipilih. Untuk mempelajari lebih lanjut lihat,Pasangan nilai kunci ditemukan di bawah humanAnswers.	Daftar JSON objek yang berisi tanggapan pekerja.
`humanLoopName`	`system-generated-hash`	Sebuah sistem menghasilkan string hex 40 karakter.
`inputRecord`	`"inputRecord": { "prompt": { "text": "Who invented the airplane?" }, "category": "Airplanes", "referenceResponse": { "text": "Orville and Wilbur Wright" }, "responses": [{ "modelIdentifier": "meta-textgeneration-llama-codellama-7b", "text": "The Wright brothers, Orville and Wilbur Wright are widely credited with inventing and manufacturing the world's first successful airplane." }] }`	JSONObjek yang berisi prompt entri dari dataset input.
`modelResponses`	`"modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/model-id", "text": "the-models-response-to-the-prompt" }]`	Tanggapan individu dari model.
`inputContent`	`{ "additionalDataS3Uri":"s3://user-specified-S3-URI-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"brief-name", "metricName":"metric-name", "metricType":"IndividualLikertScale" } ], "instructions":"example instructions" }`	Konten input loop manusia diperlukan untuk memulai loop manusia di bucket Amazon S3 Anda.
`modelResponseIdMap`	`{ "0": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612", "1": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352" }`	Menjelaskan bagaimana setiap model direpresentasikan dalam`answerContent`.

Pasangan nilai kunci ditemukan di bawah `humanEvaluationResult`

Pasangan nilai kunci berikut ditemukan humanEvaluationResult di bawah output pekerjaan evaluasi model Anda.

Untuk pasangan nilai kunci yang terkait denganhumanAnswers, lihatPasangan nilai kunci ditemukan di bawah humanAnswers.

flowDefinitionArn

Definisi aliran yang digunakan untuk menyelesaikan pekerjaan evaluasi model. ARN
Contoh:arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name

humanLoopName

Sebuah sistem menghasilkan string hex 40 karakter.

inputContent

Nilai kunci ini menjelaskan jenis metrik, dan instruksi yang Anda berikan untuk pekerja di portal pekerja.
- additionalDataS3Uri: Lokasi di Amazon S3 tempat instruksi untuk pekerja disimpan.
- instructions: Instruksi yang Anda berikan kepada pekerja di portal pekerja.
- evaluationMetrics: Nama metrik dan deskripsinya. Nilai kuncinya metricType adalah alat yang diberikan kepada pekerja untuk mengevaluasi respons model.

modelResponseIdMap

Pasangan nilai kunci ini mengidentifikasi nama lengkap model yang dipilih, dan bagaimana pilihan pekerja dipetakan ke model dalam pasangan nilai humanAnswers kunci.

Pasangan nilai kunci ditemukan di bawah `inputRecord`

Entri berikut menjelaskan pasangan nilai inputRecord kunci.

prompt

Teks prompt dikirim ke model.

category

Kategori opsional yang mengklasifikasikan prompt. Terlihat oleh pekerja di portal pekerja selama evaluasi model.
Contoh:"American cities"

referenceResponse

Bidang opsional dari input yang JSON digunakan untuk menentukan kebenaran dasar yang Anda ingin referensi pekerja selama evaluasi

responses

Bidang opsional dari input JSON yang berisi tanggapan dari model lain.

Contoh catatan JSON masukan.


{
    "prompt": {
        "text": "Who invented the airplane?"
    },
    "category": "Airplanes",
    "referenceResponse": {
        "text": "Orville and Wilbur Wright"
    },
    "responses":
        // All inference must come from a single model
        [{
            "modelIdentifier": "meta-textgeneration-llama-codellama-7b" ,
            "text": "The Wright brothers, Orville and Wilbur Wright are widely credited with inventing and manufacturing the world's first successful airplane."
        }]

}

Pasangan nilai kunci ditemukan di bawah `modelResponses`

Array pasangan nilai kunci yang berisi respons dari model, dan model mana yang memberikan respons.

text

Respons model terhadap prompt.

modelIdentifier

Nama modul.

Pasangan nilai kunci ditemukan di bawah `humanAnswers`

Array pasangan nilai kunci yang berisi respons dari model, dan bagaimana pekerja mengevaluasi model di

acceptanceTime

Ketika pekerja menerima tugas di portal pekerja.

submissionTime

Ketika pekerja mengajukan tanggapan mereka.

timeSpentInSeconds

Berapa lama pekerja menghabiskan menyelesaikan tugas.

workerId

ID pekerja yang menyelesaikan tugas.

workerMetadata

Metadata tentang tim kerja mana yang ditugaskan untuk pekerjaan evaluasi model ini.

Format `answerContent` JSON array

Struktur jawaban tergantung pada metrik evaluasi yang dipilih saat pekerjaan evaluasi model dibuat. Setiap respons atau jawaban pekerja dicatat dalam JSON objek baru.

answerContent

evaluationResultsberisi tanggapan pekerja.
- Ketika tombol Pilihan dipilih, hasil dari setiap pekerja adalah sebagai"evaluationResults": "comparisonChoice".
  
  metricName: Nama metrik
  
  result: JSON Objek menunjukkan model mana yang dipilih pekerja menggunakan salah satu 0 atau1. Untuk melihat nilai mana model dipetakan untuk dilihat,modelResponseIdMap.
- Ketika skala Likert, perbandingan dipilih, hasil dari setiap pekerja adalah sebagai. "evaluationResults": "comparisonLikertScale"
  
  metricName: Nama metrik.
  
  leftModelResponseId: Menunjukkan modelResponseIdMap yang ditampilkan di sisi kiri portal pekerja.
  
  rightModelResponseId: Menunjukkan modelResponseIdMap yang ditampilkan di sisi kiri portal pekerja.
  
  result: JSON Objek menunjukkan model mana yang dipilih pekerja menggunakan salah satu 0 atau1. Untuk melihat nilai model mana yang dipetakan untuk dilihat, modelResponseIdMap
- Ketika peringkat Ordinal dipilih, hasil dari setiap pekerja adalah sebagai"evaluationResults": "comparisonRank".
  
  metricName: Nama metrik
  
  result: Sebuah array JSON objek. Untuk setiap model (modelResponseIdMap) pekerja menyediakan arank.
```
"result": [{
	"modelResponseId": "0",
	"rank": 1
}, {
	"modelResponseId": "1",
	"rank": 1
}]
```
- Ketika skala Likert, evaluasi respons model tunggal dipilih, hasil pekerja disimpan. "evaluationResults": "individualLikertScale" Ini adalah JSON array yang berisi skor untuk metricName ditentukan ketika pekerjaan dibuat.
  
  metricName: Nama metrik.
  
  modelResponseId: Model yang diberi skor. Untuk melihat nilai mana model dipetakan untuk dilihat,modelResponseIdMap.
  
  result: Pasangan nilai kunci yang menunjukkan nilai skala likert yang dipilih oleh pekerja.
- Ketika jempol atas/bawah dipilih, hasil dari pekerja disimpan sebagai array. JSON "evaluationResults": "thumbsUpDown"
  
  metricName: Nama metrik.
  
  result: Entah true atau false yang berkaitan dengan. metricName Ketika seorang pekerja memilih acungan jempol,. "result" : true

Contoh output dari output pekerjaan evaluasi model

JSONObjek berikut adalah contoh output pekerjaan evaluasi model yang disimpan di Amazon S3. Untuk mempelajari lebih lanjut tentang setiap pasangan nilai kunci, lihatJSONreferensi keluaran.

Untuk kejelasan pekerjaan ini hanya berisi tanggapan dari dua pekerja. Beberapa pasangan nilai kunci mungkin juga terpotong agar mudah dibaca


{
	"humanEvaluationResult": {
		"flowDefinitionArn": "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name",
        "humanAnswers": [
            {
                "acceptanceTime": "2024-06-07T22:31:57.066Z",
                "answerContent": {
                    "evaluationResults": {
                        "comparisonChoice": [
                            {
                                "metricName": "Fluency",
                                "result": {
                                    "modelResponseId": "0"
                                }
                            }
                        ],
                        "comparisonLikertScale": [
                            {
                                "leftModelResponseId": "0",
                                "metricName": "Coherence",
                                "result": 1,
                                "rightModelResponseId": "1"
                            }
                        ],
                        "comparisonRank": [
                            {
                                "metricName": "Toxicity",
                                "result": [
                                    {
                                        "modelResponseId": "0",
                                        "rank": 1
                                    },
                                    {
                                        "modelResponseId": "1",
                                        "rank": 1
                                    }
                                ]
                            }
                        ],
                        "individualLikertScale": [
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "0",
                                "result": 2
                            },
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "1",
                                "result": 3
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "0",
                                "result": 1
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "1",
                                "result": 4
                            }
                        ],
                        "thumbsUpDown": [
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "0",
                                "result": true
                            },
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "1",
                                "result": true
                            }
                        ]
                    }
                },
                "submissionTime": "2024-06-07T22:32:19.640Z",
                "timeSpentInSeconds": 22.574,
                "workerId": "ead1ba56c1278175",
                "workerMetadata": {
                    "identityData": {
                        "identityProviderType": "Cognito",
                        "issuer": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_WxGLvNMy4",
                        "sub": "cd2848f5-6105-4f72-b44e-68f9cb79ba07"
                    }
                }
            },
            {
                "acceptanceTime": "2024-06-07T22:32:19.721Z",
                "answerContent": {
                    "evaluationResults": {
                        "comparisonChoice": [
                            {
                                "metricName": "Fluency",
                                "result": {
                                    "modelResponseId": "1"
                                }
                            }
                        ],
                        "comparisonLikertScale": [
                            {
                                "leftModelResponseId": "0",
                                "metricName": "Coherence",
                                "result": 1,
                                "rightModelResponseId": "1"
                            }
                        ],
                        "comparisonRank": [
                            {
                                "metricName": "Toxicity",
                                "result": [
                                    {
                                        "modelResponseId": "0",
                                        "rank": 2
                                    },
                                    {
                                        "modelResponseId": "1",
                                        "rank": 1
                                    }
                                ]
                            }
                        ],
                        "individualLikertScale": [
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "0",
                                "result": 3
                            },
                            {
                                "metricName": "Correctness",
                                "modelResponseId": "1",
                                "result": 4
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "0",
                                "result": 1
                            },
                            {
                                "metricName": "Completeness",
                                "modelResponseId": "1",
                                "result": 5
                            }
                        ],
                        "thumbsUpDown": [
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "0",
                                "result": true
                            },
                            {
                                "metricName": "Accuracy",
                                "modelResponseId": "1",
                                "result": false
                            }
                        ]
                    }
                },
                "submissionTime": "2024-06-07T22:32:57.918Z",
                "timeSpentInSeconds": 38.197,
                "workerId": "bad258db224c3db6",
                "workerMetadata": {
                    "identityData": {
                        "identityProviderType": "Cognito",
                        "issuer": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_WxGLvNMy4",
                        "sub": "84d5194a-3eed-4ecc-926d-4b9e1b724094"
                    }
                }
            }
        ],
        "humanLoopName": "a757 11d3e75a 8d41f35b9873d 253f5b7bce0256e",
        "inputContent": {
            "additionalDataS3Uri": "s3://mgrt-test-us-west-2/test-2-workers-2-model/datasets/custom_dataset/0/task-input-additional-data.json",
            "instructions": "worker instructions provided by the model evaluation job administrator",
            "evaluationMetrics": [
                {
                    "metricName": "Fluency",
                    "metricType": "ComparisonChoice",
                    "description": "Measures the linguistic quality of a generated text."
                },
                {
                    "metricName": "Coherence",
                    "metricType": "ComparisonLikertScale",
                    "description": "Measures the organization and structure of a generated text."
                },
                {
                    "metricName": "Toxicity",
                    "metricType": "ComparisonRank",
                    "description": "Measures the harmfulness of a generated text."
                },
                {
                    "metricName": "Accuracy",
                    "metricType": "ThumbsUpDown",
                    "description": "Indicates the accuracy of a generated text."
                },
                {
                    "metricName": "Correctness",
                    "metricType": "IndividualLikertScale",
                    "description": "Measures a generated answer's satisfaction in the context of the question."
                },
                {
                    "metricName": "Completeness",
                    "metricType": "IndividualLikertScale",
                    "description": "Measures a generated answer's inclusion of all relevant information."
                }
            ],
            "disableRandomization": "true"
        },
        "modelResponseIdMap": {
            "0": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612",
            "1": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352"
        }
    },
    "inputRecord": {
        "prompt": {
            "text": "What is high intensity interval training?"
        },
        "category": "Fitness",
        "referenceResponse": {
            "text": "High-Intensity Interval Training (HIIT)"
        }
    },
    "modelResponses": [
        {
            "text": "High Intensity Interval Training (HIIT) is a form of exercise that alternates between periods of high intensity work and low intensity recovery.HIIT is an excellent way to increase your fitness and improve your health, but it can be difficult to get started.In this article, we will",
            "modelIdentifier": "sm-margaret-meta-textgeneration-llama-2-7b-1711485008-0612"
        },
        {
            "text": "High intensity interval training is a form of exercise consisting of short bursts of maximum effort followed by periods of rest. The theory behind HIIT is that it can often be more effective at improving cardiovascular and metabolic health than longer, lower intensity workouts.The work intervals can range in length depending on the specific type of exercise, but are typically between 20 and 90 seconds. The recovery periods are generally longer, lasting between 1 and 5 minutes. This pattern is then repeated for multiple sets.\n\nSince the work intervals are high intensity, they require more effort from your body and therefore result in a greater calorie burn. The body also continues to burn calories at an increased rate after the workout due to an effect called excess post exercise oxygen consumption (EPOC), also know as the afterburn effect.\n\nHIIT is a versatile form of training that can be adapted to different fitness levels and can be performed using a variety of exercises including cycling, running, bodyweight movements, and even swimming. It can be done in as little as 20 minutes once or twice a week, making it an efficient option for busy individuals.\n\nWhat are the benefits of high intensity interval training",
            "modelIdentifier": "jumpstart-dft-hf-llm-mistral-7b-ins-20240327-043352"
        }
    ]
}

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

Hasil Job

Memahami hasil pekerjaan evaluasi otomatis

Memahami hasil pekerjaan evaluasi manusia

JSONreferensi keluaran

Pasangan nilai kunci ditemukan di bawah humanEvaluationResult

Pasangan nilai kunci ditemukan di bawah inputRecord

Pasangan nilai kunci ditemukan di bawah modelResponses

Pasangan nilai kunci ditemukan di bawah humanAnswers

Format answerContent JSON array