AutoGluon-Tabular hyperparameters - Amazon SageMaker AI

AutoGluon-Tabular hyperparameters

The following table contains the subset of hyperparameters that are required or most commonly used for the Amazon SageMaker AI AutoGluon-Tabular algorithm. Users set these parameters to facilitate the estimation of model parameters from data. The SageMaker AI AutoGluon-Tabular algorithm is an implementation of the open-source AutoGluon-Tabular package.

Note

The default hyperparameters are based on example datasets in the AutoGluon-Tabular sample notebooks.

By default, the SageMaker AI AutoGluon-Tabular algorithm automatically chooses an evaluation metric based on the type of classification problem. The algorithm detects the type of classification problem based on the number of labels in your data. For regression problems, the evaluation metric is root mean squared error. For binary classification problems, the evaluation metric is area under the receiver operating characteristic curve (AUC). For multiclass classification problems, the evaluation metric is accuracy. You can use the eval_metric hyperparameter to change the default evaluation metric. Refer to the following table for more information on AutoGluon-Tabular hyperparameters, including descriptions, valid values, and default values.

Parameter Name Description
eval_metric

The evaluation metric for validation data. If eval_metric is set to the default "auto" value, then the algorithm automatically chooses an evaluation metric based on the type of classification problem:

  • "root_mean_squared_error" for regression

  • "roc_auc" for binary classification

  • "accuracy" for multi-class classification

Valid values: string, refer to the AutoGluon documentation for valid values.

Default value: "auto".

presets

List of preset configurations for various arguments in fit().

  • "best_quality": high predictive accuracy, slower inference times and higher disk usage

  • "high_quality": high predictive accuracy and fast inference

  • "good_quality": good predictive accuracy and very fast inference

  • "medium_quality": medium predictive accuracy, very fast inference and training time

  • "optimize_for_deployment": delete unused models and remove training artifacts

  • "interpretable": fits only interpretable rule-based models from the imodels package

For more details, see AutoGluon Predictors.

Valid values: string, any of the following: ("best_quality", "high_quality", good_quality", "medium_quality", "optimize_for_deployment", or "interpretable").

Default value: "medium_quality".

auto_stack

Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Set auto_stack to "True" if you are willing to tolerate longer training times in order to maximize predictive accuracy. This automatically sets the num_bag_folds and num_stack_levels arguments based on dataset properties.

Valid values: string, "True" or "False".

Default value: "False".

num_bag_folds

Number of folds used for bagging of models. When num_bag_folds is equal to k, training time is roughly increased by a factor of k. Set num_bag_folds to 0 to deactivate bagging. This is disabled by default, but we recommend using values between 5 and 10 to maximize predictive performance. Increasing num_bag_folds results in models with lower bias, but that are more prone to overfitting. One is an invalid value for this parameter, and will raise a ValueError. Values greater than 10 may produce diminishing returns and can even harm overall results due to overfitting. To further improve predictions, avoid increasing num_bag_folds and instead increase num_bag_sets.

Valid values: string, any integer between (and including) "0" and "10".

Default value: "0".

num_bag_sets

Number of repeats of kfold bagging to perform (values must be greater than or equal to 1). The total number of models trained during bagging is equal to num_bag_folds * num_bag_sets. This parameter defaults to one if time_limit is not specified. This parameters is disabled if num_bag_folds is not specified. Values greater than one result in superior predictive performance, especially on smaller problems and with stacking enabled.

Valid values: integer, range: [1, 20].

Default value: 1.

num_stack_levels

Number of stacking levels to use in stack ensemble. Roughly increases model training time by factor of num_stack_levels + 1. Set this parameter to 0 to deactivate stack ensembling. This parameter is deactivated by default, but we recommend using values between 1 and 3 to maximize predictive performance. To prevent overfitting and a ValueError, num_bag_folds must be greater than or equal to 2.

Valid values: float, range: [0, 3].

Default value: 0.

refit_full

Whether or not to retrain all models on all of the data (training and validation) after the normal training procedure. For more details, see AutoGluon Predictors.

Valid values: string, "True" or "False".

Default value: "False".

set_best_to_refit_full

Whether or not to change the default model that the predictor uses for prediction. If set_best_to_refit_full is set to "True", the default model changes to the model that exhibited the highest validation score as a result of refitting (activated by refit_full). Only valid if refit_full is set.

Valid values: string, "True" or "False".

Default value: "False".

save_space

Whether or note to reduce the memory and disk size of predictor by deleting auxiliary model files that aren’t needed for prediction on new data. This has no impact on inference accuracy. We recommend setting save_space to "True" if the only goal is to use the trained model for prediction. Certain advanced functionality may no longer be available if save_space is set to "True". Refer to the predictor.save_space() documentation for more details.

Valid values: string, "True" or "False".

Default value: "False".

verbosity

The verbosity of print messages. verbosity levels range from 0 to 4, with higher levels corresponding to more detailed print statements. A verbosity of 0 suppresses warnings.

Valid values: integer, any of the following: (0, 1, 2, 3, or 4).

Default value: 2.