Reports generated by Amazon SageMaker Autopilot
In addition to the data exploration notebook, Autopilot generates various reports for the best model candidate of each experiment.
-
An explainability report provides insights into how the model makes forecasts.
-
A performance report provides a quantitative assessment of the model's forecasting capabilities.
-
A backtest results report is generated after testing the model's performance on historical data.
Explainability report
Autopilot explainability report helps you better understand how the attributes in your datasets impact forecasts for specific time-series (item and dimension combinations) and time points. Autopilot uses a metric called Impact scores to quantify the relative impact of each attribute and determine whether they increase or decrease forecast values.
For example, consider a forecasting scenario where the target is sales
and
there are two related attributes: price
and color
. Autopilot may find
that the item’s color has a high impact on sales for certain items, but a negligible effect
for other items. It may also find that a promotion in the summer has a high impact on sales,
but a promotion in the winter has little effect.
The explainability report is generated only when:
-
The time series dataset includes additional feature columns or is associated with a holiday calendar.
-
The base models CNN-QR and DeepAR+ are included in the final ensemble.
Interpret Impact scores
Impact scores measure the relative impact attributes have on forecast values. For
example, if the price
attribute has an impact score that is twice as large as
the store location
attribute, you can conclude that the price of an item has
twice the impact on forecast values than the store location.
Impact scores also provide information on whether attributes increase or decrease forecast values.
The Impact scores range from -1 to 1, where the sign denotes the direction of the impact. A score of 0 indicates no impact, while scores close to 1 or -1 indicate a significant impact.
It is important to note that Impact scores measure the relative impact of attributes, not the absolute impact. Therefore, Impact scores cannot be used to determine whether particular attributes improve model accuracy. If an attribute has a low Impact score, that does not necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor.
Find the explainability report
You can find the Amazon S3 prefix to the explainability artifacts generated for the best
candidate in the response to DescribeAutoMLJobV2
at BestCandidate.CandidateProperties.CandidateArtifactLocations.Explainability
.
Model performance report
Autopilot model quality report (also referred to as performance report) provides
insights and quality information for the best model candidate (best predictor) generated by
an AutoML job. This includes information about the job details, objective function, and
accuracy metrics (wQL
, MAPE
, WAPE
, RMSE
,
MASE
).
You can find the Amazon S3 prefix to the model quality report artifacts generated for the
best candidate in the response to DescribeAutoMLJobV2
at BestCandidate.CandidateProperties.CandidateArtifactLocations.ModelInsights
.
Backtests results report
Backtests results provide insights into the performance of a time-series forecasting model by evaluating its predictive accuracy and reliability. It helps analysts and data scientists assess its performance on historical data and assists in understanding its potential performance on future, unseen data.
Autopilot uses backtesting to tune parameters and produce accuracy metrics. During backtesting, Autopilot automatically splits your time-series data into two sets, a training set and a testing set. The training set is used to train a model which is then used to generate forecasts for data points in the testing set. Autopilot uses this testing dataset to evaluate the model's accuracy by comparing forecasted values with observed values in the testing set.
You can find the Amazon S3 prefix to the model quality report artifacts generated for the
best candidate in the response to DescribeAutoMLJobV2
at BestCandidate.CandidateProperties.CandidateArtifactLocations.BacktestResults
.