LightGBM hyperparameters
The following table contains the subset of hyperparameters that are required or most
commonly used for the Amazon SageMaker AI LightGBM algorithm. Users set these parameters to
facilitate the estimation of model parameters from data. The SageMaker AI LightGBM algorithm is
an implementation of the open-source LightGBM
Note
The default hyperparameters are based on example datasets in the LightGBM sample notebooks.
By default, the SageMaker AI LightGBM algorithm automatically chooses an evaluation metric and
objective function based on the type of classification problem. The LightGBM algorithm
detects the type of classification problem based on the number of labels in your data.
For regression problems, the evaluation metric is root mean squared error and the
objective function is L2 loss. For binary classification problems, the evaluation metric
and objective function are both binary cross entropy. For multiclass classification
problems, the evaluation metric is multiclass cross entropy and the objective function
is softmax. You can use the metric
hyperparameter to change the default
evaluation metric. Refer to the following table for more information on LightGBM
hyperparameters, including descriptions, valid values, and default values.
Parameter Name | Description |
---|---|
num_boost_round |
The maximum number of boosting iterations.
Note: Internally,
LightGBM constructs Valid values: integer, range: Positive integer. Default value: |
early_stopping_rounds |
The training will stop if one metric of one validation
data point does not improve in the last Valid values: integer. Default value: |
metric |
The evaluation metric for validation data. If
Valid values: string, any of the following: ( Default value: |
learning_rate |
The rate at which the model weights are updated after working through each batch of training examples. Valid values: float, range: ( Default value: |
num_leaves |
The maximum number of leaves in one tree. Valid values: integer, range: ( Default value: |
feature_fraction |
A subset of features to be selected on each iteration (tree). Must be less than 1.0. Valid values: float, range: ( Default value: |
bagging_fraction |
A subset of features similar to Valid values: float, range: ( Default value: |
bagging_freq |
The frequency to perform bagging. At every Valid values: integer, range: Non-negative integer. Default value: |
max_depth |
The maximum depth for a tree model. This is used to deal with overfitting
when the amount of data is small. If Valid values: integer. Default value: |
min_data_in_leaf |
The minimum amount of data in one leaf. Can be used to deal with overfitting. Valid values: integer, range: Non-negative integer. Default value: |
max_delta_step |
Used to limit the max output of tree leaves. If Valid values: float. Default value: |
lambda_l1 |
L1 regularization. Valid values: float, range: Non-negative float. Default value: |
lambda_l2 |
L2 regularization. Valid values: float, range: Non-negative float. Default value: |
boosting |
Boosting type Valid values: string, any of the following: ( Default value: |
min_gain_to_split |
The minimum gain to perform a split. Can be used to speed up training. Valid values: integer, float: Non-negative float. Default value: |
scale_pos_weight |
The weight of the labels with positive class. Used only for binary
classification tasks. Valid values: float, range: Positive float. Default value: |
tree_learner |
Tree learner type. Valid values: string, any of the following:
( Default value: |
feature_fraction_bynode |
Selects a subset of random features on each tree node. For
example, if Valid values: integer, range: ( Default value: |
is_unbalance |
Set to Valid values: string, either: ( Default value: |
max_bin |
The maximum number of bins used to bucket feature values. A small number of bins may reduce training accuracy, but may increase general performance. Can be used to deal with overfitting. Valid values: integer, range: (1, ∞). Default value: |
tweedie_variance_power |
Controls the variance of the Tweedie distribution. Set this closer
to Valid values: float, range: [ Default value: |
num_threads |
Number of parallel threads used to run LightGBM. Value 0 means default number of threads in OpenMP. Valid values: integer, range: Non-negative integer. Default value: |
verbosity |
The verbosity of print messages. If the Valid values: integer. Default value: |