Requirements for custom prompt datasets in model evaluation jobs that use human workers
To create a model evaluation job that uses human workers you must specify a prompt dataset. The prompts are then used during inference with the model you select to evaluate.
You must create a custom prompt dataset in a model evaluation jobs that uses human workers. Custom prompt datasets must be stored in Amazon S3, and use the JSON line format and use the .jsonl
file extension. Each line must be a valid JSON object. There can be up to 1000 prompts in your dataset per automatic evaluation job.
A valid prompt entry must contain the prompt
key. Both category
and referenceResponse
are optional. Use the category
key to label your prompt with a specific category that you can use to filter the results when reviewing them in the model evaluation report card. Use the referenceResponse
key to specify the ground truth response that your workers can reference during the evaluation.
In the worker UI, what you specify for prompt
and referenceResponse
are visible to your human workers.
For job created using the console you must update the Cross Origin Resource Sharing (CORS) configuration on the S3 bucket. To learn more about the required CORS permissions, see Required Cross Origin Resource Sharing (CORS) permissions on S3 buckets.
The following is an example custom dataset that contains 6 inputs and uses the JSON line format.
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
{"prompt":"Provide the prompt you want the model to use during inference
","category":"(Optional) Specify an optional category
","referenceResponse":"(Optional) Specify a ground truth response
."}
The following example is a single entry expanded for clarity
{ "prompt": "What is high intensity interval training?", "category": "Fitness", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods." }