CatBoost hyperparameters
The following table contains the subset of hyperparameters that are required or most
commonly used for the Amazon SageMaker AI CatBoost algorithm. Users set these parameters to facilitate
the estimation of model parameters from data. The SageMaker AI CatBoost algorithm is an implementation
of the open-source CatBoost
Note
The default hyperparameters are based on example datasets in the CatBoost sample notebooks.
By default, the SageMaker AI CatBoost algorithm automatically chooses an evaluation metric and
loss function based on the type of classification problem. The CatBoost algorithm
detects the type of classification problem based on the number of labels in your data.
For regression problems, the evaluation metric and loss functions are both root mean
squared error. For binary classification problems, the evaluation metric is Area Under
the Curve (AUC) and the loss function is log loss. For multiclass classification
problems, the evaluation metric and loss functions are multiclass cross entropy. You can
use the eval_metric
hyperparameter to change the default evaluation metric.
Refer to the following table for more information on LightGBM hyperparameters, including
descriptions, valid values, and default values.
Parameter Name | Description |
---|---|
iterations |
The maximum number of trees that can be built. Valid values: integer, range: Positive integer. Default value: |
early_stopping_rounds |
The training will stop if one metric of one validation
data point does not improve in the last Valid values: integer. Default value: |
eval_metric |
The evaluation metric for validation data. If
Valid values: string, refer to the CatBoost documentation Default value: |
learning_rate |
The rate at which the model weights are updated after working through each batch of training examples. Valid values: float, range: ( Default value: |
depth |
Depth of the tree. Valid values: integer, range: ( Default value: |
l2_leaf_reg |
Coefficient for the L2 regularization term of the cost function. Valid values: integer, range: Positive integer. Default value: |
random_strength |
The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. Valid values: float, range: Positive floating point number. Default value: |
max_leaves |
The maximum number of leaves in the resulting tree. Can only be used with the Valid values: integer, range: [ Default value: |
rsm |
Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. Valid values: float, range: ( Default value: |
sampling_frequency |
Frequency to sample weights and objects when building trees. Valid values: string, either: ( Default value: |
min_data_in_leaf |
The minimum number of training samples in a leaf.
CatBoost does not search for new splits in leaves with a sample count less than the specified value.
Can only be used with the Valid values: integer, range: ( Default value: |
bagging_temperature |
Defines the settings of the Bayesian bootstrap.
Use the Bayesian bootstrap to assign random weights to objects.
If Valid values: float, range: Non-negative float. Default value: |
boosting_type |
The boosting scheme. "Auto" means that the Valid values: string, any of the following: ( Default value: |
scale_pos_weight |
The weight for positive class in binary classification. The value is used as a multiplier for the weights of objects from positive class. Valid values: float, range: Positive float. Default value: |
max_bin |
The number of splits for numerical features. Valid values: string, either: ( Default value: |
grow_policy |
The tree growing policy. Defines how to perform greedy tree construction. Valid values: string, any of the following:
( Default value: |
random_seed |
The random seed used for training. Valid values: integer, range: Non-negative integer. Default value: |
thread_count |
The number of threads to use during the training.
If Valid values: integer, either: ( Default value: |
verbose |
The verbosity of print messages, with higher levels corresponding to more detailed print statements. Valid values: integer, range: Positive integer. Default value: |