K-Means Hyperparameters
In the CreateTrainingJob
request, you specify the training algorithm that you want to use. You can also specify
algorithm-specific hyperparameters as string-to-string maps. The following table lists
the hyperparameters for the k-means training algorithm provided by Amazon SageMaker AI. For more
information about how k-means clustering works, see How K-Means Clustering Works.
Parameter Name | Description |
---|---|
feature_dim |
The number of features in the input data. Required Valid values: Positive integer |
k |
The number of required clusters. Required Valid values: Positive integer |
epochs |
The number of passes done over the training data. Optional Valid values: Positive integer Default value: 1 |
eval_metrics |
A JSON list of metric types used to report a score for the model. Allowed values are
Optional Valid values: Either Default value: |
extra_center_factor |
The algorithm creates K centers = Optional Valid values: Either a positive integer or
Default value: |
half_life_time_size |
Used to determine the weight given to an observation when
computing a cluster mean. This weight decays exponentially as more
points are observed. When a point is first observed, it is assigned
a weight of 1 when computing the cluster mean. The decay constant
for the exponential decay function is chosen so that after observing
Optional Valid values: Non-negative integer Default value: 0 |
init_method |
Method by which the algorithm chooses the initial cluster centers. The standard k-means approach chooses them at random. An alternative k-means++ method chooses the first cluster center at random. Then it spreads out the position of the remaining initial clusters by weighting the selection of centers with a probability distribution that is proportional to the square of the distance of the remaining data points from existing centers. Optional Valid values: Either Default value: |
local_lloyd_init_method |
The initialization method for Lloyd's expectation-maximization
(EM) procedure used to build the final model containing
Optional Valid values: Either Default value: |
local_lloyd_max_iter |
The maximum number of iterations for Lloyd's
expectation-maximization (EM) procedure used to build the final
model containing Optional Valid values: Positive integer Default value: 300 |
local_lloyd_num_trials |
The number of times the Lloyd's expectation-maximization (EM)
procedure with the least loss is run when building the final model
containing Optional Valid values: Either a positive integer or
Default value: |
local_lloyd_tol |
The tolerance for change in loss for early stopping of Lloyd's
expectation-maximization (EM) procedure used to build the final
model containing Optional Valid values: Float. Range in [0, 1]. Default value: 0.0001 |
mini_batch_size |
The number of observations per mini-batch for the data iterator. Optional Valid values: Positive integer Default value: 5000 |