LightGBM hyperparameters - Amazon SageMaker AI

LightGBM hyperparameters

The following table contains the subset of hyperparameters that are required or most commonly used for the Amazon SageMaker AI LightGBM algorithm. Users set these parameters to facilitate the estimation of model parameters from data. The SageMaker AI LightGBM algorithm is an implementation of the open-source LightGBM package.

Note

The default hyperparameters are based on example datasets in the LightGBM sample notebooks.

By default, the SageMaker AI LightGBM algorithm automatically chooses an evaluation metric and objective function based on the type of classification problem. The LightGBM algorithm detects the type of classification problem based on the number of labels in your data. For regression problems, the evaluation metric is root mean squared error and the objective function is L2 loss. For binary classification problems, the evaluation metric and objective function are both binary cross entropy. For multiclass classification problems, the evaluation metric is multiclass cross entropy and the objective function is softmax. You can use the metric hyperparameter to change the default evaluation metric. Refer to the following table for more information on LightGBM hyperparameters, including descriptions, valid values, and default values.

Parameter Name Description
num_boost_round

The maximum number of boosting iterations. Note: Internally, LightGBM constructs num_class * num_boost_round trees for multi-class classification problems.

Valid values: integer, range: Positive integer.

Default value: 100.

early_stopping_rounds

The training will stop if one metric of one validation data point does not improve in the last early_stopping_rounds round. If early_stopping_rounds is less than or equal to zero, this hyperparameter is ignored.

Valid values: integer.

Default value: 10.

metric

The evaluation metric for validation data. If metric is set to the default "auto" value, then the algorithm automatically chooses an evaluation metric based on the type of classification problem:

  • rmse for regression

  • binary_logloss for binary classification

  • multi_logloss for multi-class classification

Valid values: string, any of the following: ("auto", "rmse", "l1", "l2", "huber", "fair", "binary_logloss", "binary_error", "auc", "average_precision", "multi_logloss", "multi_error", "auc_mu", or "cross_entropy").

Default value: "auto".

learning_rate

The rate at which the model weights are updated after working through each batch of training examples.

Valid values: float, range: (0.0, 1.0).

Default value: 0.1.

num_leaves

The maximum number of leaves in one tree.

Valid values: integer, range: (1, 131072).

Default value: 64.

feature_fraction

A subset of features to be selected on each iteration (tree). Must be less than 1.0.

Valid values: float, range: (0.0, 1.0).

Default value: 0.9.

bagging_fraction

A subset of features similar to feature_fraction, but bagging_fraction randomly selects part of the data without resampling.

Valid values: float, range: (0.0, 1.0].

Default value: 0.9.

bagging_freq

The frequency to perform bagging. At every bagging_freq iteration, LightGBM randomly selects a percentage of the data to use for the next bagging_freq iteration. This percentage is determined by the bagging_fraction hyperparameter. If bagging_freq is zero, then bagging is deactivated.

Valid values: integer, range: Non-negative integer.

Default value: 1.

max_depth

The maximum depth for a tree model. This is used to deal with overfitting when the amount of data is small. If max_depth is less than or equal to zero, this means there is no limit for maximum depth.

Valid values: integer.

Default value: 6.

min_data_in_leaf

The minimum amount of data in one leaf. Can be used to deal with overfitting.

Valid values: integer, range: Non-negative integer.

Default value: 3.

max_delta_step

Used to limit the max output of tree leaves. If max_delta_step is less than or equal to 0, then there is no constraint. The final max output of leaves is learning_rate * max_delta_step.

Valid values: float.

Default value: 0.0.

lambda_l1

L1 regularization.

Valid values: float, range: Non-negative float.

Default value: 0.0.

lambda_l2

L2 regularization.

Valid values: float, range: Non-negative float.

Default value: 0.0.

boosting

Boosting type

Valid values: string, any of the following: ("gbdt", "rf", "dart", or "goss").

Default value: "gbdt".

min_gain_to_split

The minimum gain to perform a split. Can be used to speed up training.

Valid values: integer, float: Non-negative float.

Default value: 0.0.

scale_pos_weight

The weight of the labels with positive class. Used only for binary classification tasks. scale_pos_weight cannot be used if is_unbalance is set to "True".

Valid values: float, range: Positive float.

Default value: 1.0.

tree_learner

Tree learner type.

Valid values: string, any of the following: ("serial", "feature", "data", or "voting").

Default value: "serial".

feature_fraction_bynode

Selects a subset of random features on each tree node. For example, if feature_fraction_bynode is 0.8, then 80% of features are selected. Can be used to deal with overfitting.

Valid values: integer, range: (0.0, 1.0].

Default value: 1.0.

is_unbalance

Set to "True" if training data is unbalanced. Used only for binary classification tasks. is_unbalance cannot be used with scale_pos_weight.

Valid values: string, either: ("True" or "False").

Default value: "False".

max_bin

The maximum number of bins used to bucket feature values. A small number of bins may reduce training accuracy, but may increase general performance. Can be used to deal with overfitting.

Valid values: integer, range: (1, ∞).

Default value: 255.

tweedie_variance_power

Controls the variance of the Tweedie distribution. Set this closer to 2.0 to shift toward a gamma distribution. Set this closer to 1.0 to shift toward a Poisson distribution. Used only for regression tasks.

Valid values: float, range: [1.0, 2.0).

Default value: 1.5.

num_threads

Number of parallel threads used to run LightGBM. Value 0 means default number of threads in OpenMP.

Valid values: integer, range: Non-negative integer.

Default value: 0.

verbosity

The verbosity of print messages. If the verbosity is less than 0, then print messages only show fatal errors. If verbosity is set to 0, then print messages include errors and warnings. If verbosity is 1, then print messages show more information. A verbosity greater than 1 shows the most information in print messages and can be used for debugging.

Valid values: integer.

Default value: 1.