Tune an NTM Model
Automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.
Amazon SageMaker AI NTM is an unsupervised learning algorithm that learns latent representations of large collections of discrete data, such as a corpus of documents. Latent representations use inferred variables that are not directly measured to model the observations in a dataset. Automatic model tuning on NTM helps you find the model that minimizes loss over the training or validation data. Training loss measures how well the model fits the training data. Validation loss measures how well the model can generalize to data that it is not trained on. Low training loss indicates that a model is a good fit to the training data. Low validation loss indicates that a model has not overfit the training data and so should be able to model documents successfully on which is has not been trained. Usually, it's preferable to have both losses be small. However, minimizing training loss too much might result in overfitting and increase validation loss, which would reduce the generality of the model.
For more information about model tuning, see Automatic model tuning with SageMaker AI.
Metrics Computed by the NTM Algorithm
The
NTM algorithm reports a single metric that is computed during training:
validation:total_loss
. The total loss is the sum of the
reconstruction
loss and Kullback-Leibler divergence. When tuning hyperparameter
values, choose this metric as the objective.
Metric Name | Description | Optimization Direction |
---|---|---|
validation:total_loss |
Total Loss on validation set |
Minimize |
Tunable NTM Hyperparameters
You can tune the following hyperparameters for the NTM algorithm. Usually setting
low mini_batch_size
and small learning_rate
values results
in lower validation losses, although it might take longer to train. Low validation
losses don't necessarily produce more
coherent
topics as interpreted by humans.
The
effect of other hyperparameters on
training and validation loss can vary from dataset to dataset. To see which values
are
compatible,
see NTM Hyperparameters.
Parameter Name | Parameter Type | Recommended Ranges |
---|---|---|
encoder_layers_activation |
CategoricalParameterRanges |
['sigmoid', 'tanh', 'relu'] |
learning_rate |
ContinuousParameterRange |
MinValue: 1e-4, MaxValue: 0.1 |
mini_batch_size |
IntegerParameterRanges |
MinValue: 16, MaxValue:2048 |
optimizer |
CategoricalParameterRanges |
['sgd', 'adam', 'adadelta'] |
rescale_gradient |
ContinuousParameterRange |
MinValue: 0.1, MaxValue: 1.0 |
weight_decay |
ContinuousParameterRange |
MinValue: 0.0, MaxValue: 1.0 |