Understand the results of an automatic evaluation job

Focus mode

Understand the results of an automatic evaluation job - Amazon SageMaker AI

Interpreting the output.json file's structure Interpreting the instance-wise results file's structure Interpreting the report

When you automatic model evaluation job completes the results are saved in Amazon S3. The sections below describe the files generated and how to interpret them.

Interpreting the `output.json` file's structure

The output.json file contains aggregate scores for your selected datasets and metrics.

The following is an example output


{
    "evaluations": [{
        "evaluation_name": "factual_knowledge",
        "dataset_name": "trex",
		## The structure of the prompt template changes based on the foundation model selected
		"prompt_template": "<s>[INST] <<SYS>>Answer the question at the end in as few words as possible. Do not repeat the question. Do not answer in complete sentences.<</SYS> Question: $feature [/INST]",
        "dataset_scores": [{
            "name": "factual_knowledge",
            "value": 0.2966666666666667
        }],
        "category_scores": [{
                "name": "Author",
                "scores": [{
                    "name": "factual_knowledge",
                    "value": 0.4117647058823529
                }]
            },
				....
            {
                "name": "Capitals",
                "scores": [{
                    "name": "factual_knowledge",
                    "value": 0.2857142857142857
                }]
            }
        ]
    }]
}

Interpreting the instance-wise results file's structure

Oneevaluation_name_dataset_name.jsonl file containing instance-wise results for each jsonlines request. If you had 300 requests in your jsonlines input data, this jsonlines output file contains 300 responses. The output file contains the request made to your model followed by the score for that evaluation. An example instance-wide output follows.

Interpreting the report

An Evaluation Report contains the results of your foundation model evaluation job. The content of the evaluation report depends on the kind of task you used to evaluate your model. Each report contains the following sections:

The overall scores for each successful evaluation under the evaluation task. As an example of one evaluation with one dataset, if you evaluated your model for a classification task for Accuracy and Semantic Robustness, then a table summarizing the evaluation results for Accuracy and Accuracy Semantic Robustness appears at the top of your report. Other evaluations with other datasets may be structured differently.
The configuration for your evaluation job including the model name, type, which evaluation methods were used and what datasets your model was evaluated against.
A Detailed Evaluation Results section that summarizes the evaluation algorithm, provides information about and links to any built-in datasets, how scores are calculated, and tables showing some sample data with their associated scores.
A Failed Evaluations section that contains a list of evaluations that did not complete. If no evaluations failed, this section of the report is omitted.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Understand the results of a human evaluation job

Using the fmeval library

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Understand the results of an automatic evaluation job

Interpreting the `output.json` file's structure

Interpreting the instance-wise results file's structure

Interpreting the report

On this page

Related resources

Did this page help you?

Related resources

Next topic:

Previous topic:

Need help?