Text summarization for model evaluation in Amazon Bedrock

Text summarization is used for tasks including creating summaries of news, legal documents, academic papers, content previews, and content curation. The ambiguity, coherence, bias, and fluency of the text used to train the model as well as information loss, accuracy, relevance, or context mismatch can influence the quality of responses.

Important

For text summarization, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully.

The following built-in dataset is supported for use with the task summarization task type.

Gigaword: The Gigaword dataset consists of news article headlines. This dataset is used in text summarization tasks.

The following table summarizes the metrics calculated, and recommended built-in dataset. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built-in datasets (API).

Available built-in datasets for text summarization in Amazon Bedrock
Task type	Metric	Built-in datasets (console)	Built-in datasets (API)	Computed metric
Text summarization	Accuracy	Gigaword	`Builtin.Gigaword`	BERTScore
	Toxicity	Gigaword	`Builtin.Gigaword`	Toxicity
	Robustness	Gigaword	`Builtin.Gigaword`	BERTScore and deltaBERTScore

To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

General text generation

Question and answer