

# XGBoost hyperparameters
<a name="xgboost_hyperparameters"></a>

The following table contains the subset of hyperparameters that are required or most commonly used for the Amazon SageMaker AI XGBoost algorithm. These are parameters that are set by users to facilitate the estimation of model parameters from data. The required hyperparameters that must be set are listed first, in alphabetical order. The optional hyperparameters that can be set are listed next, also in alphabetical order. The SageMaker AI XGBoost algorithm is an implementation of the open-source DMLC XGBoost package. For details about full set of hyperparameter that can be configured for this version of XGBoost, see [ XGBoost Parameters](https://xgboost.readthedocs.io/en/release_1.2.0/).


| Parameter Name | Description | 
| --- | --- | 
| num\$1class |  The number of classes. **Required** if `objective` is set to *multi:softmax* or *multi:softprob*. Valid values: Integer.  | 
| num\$1round |  The number of rounds to run the training. **Required** Valid values: Integer.  | 
| alpha |  L1 regularization term on weights. Increasing this value makes models more conservative. **Optional** Valid values: Float. Default value: 0  | 
| base\$1score |  The initial prediction score of all instances, global bias. **Optional** Valid values: Float. Default value: 0.5  | 
| booster |  Which booster to use. The `gbtree` and `dart` values use a tree-based model, while `gblinear` uses a linear function. **Optional** Valid values: String. One of `"gbtree"`, `"gblinear"`, or `"dart"`. Default value: `"gbtree"`  | 
| colsample\$1bylevel |  Subsample ratio of columns for each split, in each level. **Optional** Valid values: Float. Range: [0,1]. Default value: 1  | 
| colsample\$1bynode |  Subsample ratio of columns from each node. **Optional** Valid values: Float. Range: (0,1]. Default value: 1  | 
| colsample\$1bytree |  Subsample ratio of columns when constructing each tree. **Optional** Valid values: Float. Range: [0,1]. Default value: 1  | 
| csv\$1weights |  When this flag is enabled, XGBoost differentiates the importance of instances for csv input by taking the second column (the column after labels) in training data as the instance weights. **Optional** Valid values: 0 or 1 Default value: 0  | 
| deterministic\$1histogram |  When this flag is enabled, XGBoost builds histogram on GPU deterministically. Used only if `tree_method` is set to `gpu_hist`. For a full list of valid inputs, please refer to [XGBoost Parameters](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst). **Optional** Valid values: String. Range: `"true"` or `"false"`. Default value: `"true"`  | 
| early\$1stopping\$1rounds |  The model trains until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` to continue training. SageMaker AI hosting uses the best model for inference. **Optional** Valid values: Integer. Default value: -  | 
| eta |  Step size shrinkage used in updates to prevent overfitting. After each boosting step, you can directly get the weights of new features. The `eta` parameter actually shrinks the feature weights to make the boosting process more conservative. **Optional** Valid values: Float. Range: [0,1]. Default value: 0.3  | 
| eval\$1metric |  Evaluation metrics for validation data. A default metric is assigned according to the objective: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html) For a list of valid inputs, see [XGBoost Learning Task Parameters](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst#learning-task-parameters). **Optional** Valid values: String. Default value: Default according to objective.  | 
| gamma |  Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm is. **Optional** Valid values: Float. Range: [0,∞). Default value: 0  | 
| grow\$1policy |  Controls the way that new nodes are added to the tree. Currently supported only if `tree_method` is set to `hist`. **Optional** Valid values: String. Either `"depthwise"` or `"lossguide"`. Default value: `"depthwise"`  | 
| interaction\$1constraints |  Specify groups of variables that are allowed to interact. **Optional** Valid values: Nested list of integers. Each integer represents a feature, and each nested list contains features that are allowed to interact e.g., [[1,2], [3,4,5]]. Default value: None  | 
| lambda |  L2 regularization term on weights. Increasing this value makes models more conservative. **Optional** Valid values: Float. Default value: 1  | 
| lambda\$1bias |  L2 regularization term on bias. **Optional** Valid values: Float. Range: [0.0, 1.0]. Default value: 0  | 
| max\$1bin |  Maximum number of discrete bins to bucket continuous features. Used only if `tree_method` is set to `hist`.  **Optional** Valid values: Integer. Default value: 256  | 
| max\$1delta\$1step |  Maximum delta step allowed for each tree's weight estimation. When a positive integer is used, it helps make the update more conservative. The preferred option is to use it in logistic regression. Set it to 1-10 to help control the update.  **Optional** Valid values: Integer. Range: [0,∞). Default value: 0  | 
| max\$1depth |  Maximum depth of a tree. Increasing this value makes the model more complex and likely to be overfit. 0 indicates no limit. A limit is required when `grow_policy`=`depth-wise`. **Optional** Valid values: Integer. Range: [0,∞) Default value: 6  | 
| max\$1leaves |  Maximum number of nodes to be added. Relevant only if `grow_policy` is set to `lossguide`. **Optional** Valid values: Integer. Default value: 0  | 
| min\$1child\$1weight |  Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than `min_child_weight`, the building process gives up further partitioning. In linear regression models, this simply corresponds to a minimum number of instances needed in each node. The larger the algorithm, the more conservative it is. **Optional** Valid values: Float. Range: [0,∞). Default value: 1  | 
| monotone\$1constraints |  Specifies monotonicity constraints on any feature. **Optional** Valid values: Tuple of Integers. Valid integers: -1 (decreasing constraint), 0 (no constraint), 1 (increasing constraint).  E.g., (0, 1): No constraint on first predictor, and an increasing constraint on the second. (-1, 1): Decreasing constraint on first predictor, and an increasing constraint on the second. Default value: (0, 0)  | 
| normalize\$1type |  Type of normalization algorithm. **Optional** Valid values: Either *tree* or *forest*. Default value: *tree*  | 
| nthread |  Number of parallel threads used to run *xgboost*. **Optional** Valid values: Integer. Default value: Maximum number of threads.  | 
| objective |  Specifies the learning task and the corresponding learning objective. Examples: `reg:logistic`, `multi:softmax`, `reg:squarederror`. For a full list of valid inputs, refer to [XGBoost Learning Task Parameters](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst#learning-task-parameters). **Optional** Valid values: String Default value: `"reg:squarederror"`  | 
| one\$1drop |  When this flag is enabled, at least one tree is always dropped during the dropout. **Optional** Valid values: 0 or 1 Default value: 0  | 
| process\$1type |  The type of boosting process to run. **Optional** Valid values: String. Either `"default"` or `"update"`. Default value: `"default"`  | 
| rate\$1drop |  The dropout rate that specifies the fraction of previous trees to drop during the dropout. **Optional** Valid values: Float. Range: [0.0, 1.0]. Default value: 0.0  | 
| refresh\$1leaf |  This is a parameter of the 'refresh' updater plug-in. When set to `true` (1), tree leaves and tree node stats are updated. When set to `false`(0), only tree node stats are updated. **Optional** Valid values: 0/1 Default value: 1  | 
| sample\$1type |  Type of sampling algorithm. **Optional** Valid values: Either `uniform` or `weighted`. Default value: `uniform`  | 
| scale\$1pos\$1weight |  Controls the balance of positive and negative weights. It's useful for unbalanced classes. A typical value to consider: `sum(negative cases)` / `sum(positive cases)`. **Optional** Valid values: float Default value: 1  | 
| seed |  Random number seed. **Optional** Valid values: integer Default value: 0  | 
| single\$1precision\$1histogram |  When this flag is enabled, XGBoost uses single precision to build histograms instead of double precision. Used only if `tree_method` is set to `hist` or `gpu_hist`. For a full list of valid inputs, please refer to [XGBoost Parameters](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst). **Optional** Valid values: String. Range: `"true"` or `"false"` Default value: `"false"`  | 
| sketch\$1eps |  Used only for approximate greedy algorithm. This translates into O(1 / `sketch_eps`) number of bins. Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy. **Optional** Valid values: Float, Range: [0, 1]. Default value: 0.03  | 
| skip\$1drop |  Probability of skipping the dropout procedure during a boosting iteration. **Optional** Valid values: Float. Range: [0.0, 1.0]. Default value: 0.0  | 
| subsample |  Subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collects half of the data instances to grow trees. This prevents overfitting. **Optional** Valid values: Float. Range: [0,1]. Default value: 1  | 
| tree\$1method |  The tree construction algorithm used in XGBoost. **Optional** Valid values: One of `auto`, `exact`, `approx`, `hist`, or `gpu_hist`. Default value: `auto`  | 
| tweedie\$1variance\$1power |  Parameter that controls the variance of the Tweedie distribution. **Optional** Valid values: Float. Range: (1, 2). Default value: 1.5  | 
| updater |  A comma-separated string that defines the sequence of tree updaters to run. This provides a modular way to construct and to modify the trees. For a full list of valid inputs, please refer to [XGBoost Parameters](https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst). **Optional** Valid values: comma-separated string. Default value: `grow_colmaker`, prune  | 
| use\$1dask\$1gpu\$1training |  Set `use_dask_gpu_training` to `"true"` if you want to run distributed GPU training with Dask. Dask GPU training is only supported for versions 1.5-1 and later. Do not set this value to `"true"` for versions preceding 1.5-1. For more information, see [Distributed GPU training](xgboost.md#Instance-XGBoost-distributed-training-gpu). **Optional** Valid values: String. Range: `"true"` or `"false"` Default value: `"false"`  | 
| verbosity | Verbosity of printing messages. Valid values: 0 (silent), 1 (warning), 2 (info), 3 (debug). **Optional** Default value: 1  | 