EvaluationDatasetMetricConfig
Defines the prompt datasets, built-in metric names and custom metric names, and the task type.
Contents
- dataset
-
Specifies the prompt dataset.
Type: EvaluationDataset object
Required: Yes
- metricNames
-
The names of the metrics you want to use for your evaluation job.
For knowledge base evaluation jobs that evaluate retrieval only, valid values are "
Builtin.ContextRelevance
", "Builtin.ContextConverage
".For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are "
Builtin.Correctness
", "Builtin.Completeness
", "Builtin.Helpfulness
", "Builtin.LogicalCoherence
", "Builtin.Faithfulness
", "Builtin.Harmfulness
", "Builtin.Stereotyping
", "Builtin.Refusal
".For automated model evaluation jobs, valid values are "
Builtin.Accuracy
", "Builtin.Robustness
", and "Builtin.Toxicity
". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness
", "Builtin.Completeness"
, "Builtin.Faithfulness"
, "Builtin.Helpfulness
", "Builtin.Coherence
", "Builtin.Relevance
", "Builtin.FollowingInstructions
", "Builtin.ProfessionalStyleAndTone
", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge "Builtin.Harmfulness
", "Builtin.Stereotyping
", and "Builtin.Refusal
".For human-based model evaluation jobs, the list of strings must match the
name
parameter specified inHumanEvaluationCustomMetric
.Type: Array of strings
Array Members: Minimum number of 1 item. Maximum number of 15 items.
Length Constraints: Minimum length of 1. Maximum length of 63.
Pattern:
^[0-9a-zA-Z-_.]+$
Required: Yes
- taskType
-
The the type of task you want to evaluate for your evaluation job. This applies only to model evaluation jobs and is ignored for knowledge base evaluation jobs.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 63.
Pattern:
^[A-Za-z0-9]+$
Valid Values:
Summarization | Classification | QuestionAndAnswer | Generation | Custom
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: