Reports generated by Amazon SageMaker Autopilot - Amazon SageMaker AI

Reports generated by Amazon SageMaker Autopilot

In addition to the data exploration notebook, Autopilot generates various reports for the best model candidate of each experiment.

  • An explainability report provides insights into how the model makes forecasts.

  • A performance report provides a quantitative assessment of the model's forecasting capabilities.

  • A backtest results report is generated after testing the model's performance on historical data.

Explainability report

Autopilot explainability report helps you better understand how the attributes in your datasets impact forecasts for specific time-series (item and dimension combinations) and time points. Autopilot uses a metric called Impact scores to quantify the relative impact of each attribute and determine whether they increase or decrease forecast values.

For example, consider a forecasting scenario where the target is sales and there are two related attributes: price and color. Autopilot may find that the item’s color has a high impact on sales for certain items, but a negligible effect for other items. It may also find that a promotion in the summer has a high impact on sales, but a promotion in the winter has little effect.

The explainability report is generated only when:

  • The time series dataset includes additional feature columns or is associated with a holiday calendar.

  • The base models CNN-QR and DeepAR+ are included in the final ensemble.

Interpret Impact scores

Impact scores measure the relative impact attributes have on forecast values. For example, if the price attribute has an impact score that is twice as large as the store location attribute, you can conclude that the price of an item has twice the impact on forecast values than the store location.

Impact scores also provide information on whether attributes increase or decrease forecast values.

The Impact scores range from -1 to 1, where the sign denotes the direction of the impact. A score of 0 indicates no impact, while scores close to 1 or -1 indicate a significant impact.

It is important to note that Impact scores measure the relative impact of attributes, not the absolute impact. Therefore, Impact scores cannot be used to determine whether particular attributes improve model accuracy. If an attribute has a low Impact score, that does not necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor.

Find the explainability report

You can find the Amazon S3 prefix to the explainability artifacts generated for the best candidate in the response to DescribeAutoMLJobV2 at BestCandidate.CandidateProperties.CandidateArtifactLocations.Explainability.

Model performance report

Autopilot model quality report (also referred to as performance report) provides insights and quality information for the best model candidate (best predictor) generated by an AutoML job. This includes information about the job details, objective function, and accuracy metrics (wQL, MAPE, WAPE, RMSE, MASE).

You can find the Amazon S3 prefix to the model quality report artifacts generated for the best candidate in the response to DescribeAutoMLJobV2 at BestCandidate.CandidateProperties.CandidateArtifactLocations.ModelInsights.

Backtests results report

Backtests results provide insights into the performance of a time-series forecasting model by evaluating its predictive accuracy and reliability. It helps analysts and data scientists assess its performance on historical data and assists in understanding its potential performance on future, unseen data.

Autopilot uses backtesting to tune parameters and produce accuracy metrics. During backtesting, Autopilot automatically splits your time-series data into two sets, a training set and a testing set. The training set is used to train a model which is then used to generate forecasts for data points in the testing set. Autopilot uses this testing dataset to evaluate the model's accuracy by comparing forecasted values with observed values in the testing set.

You can find the Amazon S3 prefix to the model quality report artifacts generated for the best candidate in the response to DescribeAutoMLJobV2 at BestCandidate.CandidateProperties.CandidateArtifactLocations.BacktestResults.