IP Insights Hyperparameters
In the CreateTransformJob
request, you specify the training algorithm. You can also specify algorithm-specific
hyperparameters as string-to-string maps. The following table lists the hyperparameters
for the Amazon SageMaker IP Insights algorithm.
Parameter Name | Description |
---|---|
num_entity_vectors |
The number of entity vector representations (entity embedding vectors) to train. Each entity in the training set is randomly assigned to one of these vectors using a hash function. Because of hash collisions, it might be possible to have multiple entities assigned to the same vector. This would cause the same vector to represent multiple entities. This generally has a negligible effect on model performance, as long as the collision rate is not too severe. To keep the collision rate low, set this value as high as possible. However, the model size, and, therefore, the memory requirement, for both training and inference, scales linearly with this hyperparameter. We recommend that you set this value to twice the number of unique entity identifiers. Required Valid values: 1 ≤ positive integer ≤ 250,000,000 |
vector_dim |
The size of embedding vectors to represent entities and IP addresses. The larger the value, the more information that can be encoded using these representations. In practice, model size scales linearly with this parameter and limits how large the dimension can be. In addition, using vector representations that are too large can cause the model to overfit, especially for small training datasets. Overfitting occurs when a model doesn't learn any pattern in the data but effectively memorizes the training data and, therefore, cannot generalize well and performs poorly during inference. The recommended value is 128. Required Valid values: 4 ≤ positive integer ≤ 4096 |
batch_metrics_publish_interval |
The interval (every X batches) at which the Apache MXNet Speedometer function prints the training speed of the network (samples/second). Optional Valid values: positive integer ≥ 1 Default value: 1,000 |
epochs |
The number of passes over the training data. The optimal value depends on your data size and learning rate. Typical values range from 5 to 100. Optional Valid values: positive integer ≥ 1 Default value: 10 |
learning_rate |
The learning rate for the optimizer. IP Insights use a gradient-descent-based Adam optimizer. The learning rate effectively controls the step size to update model parameters at each iteration. Too large a learning rate can cause the model to diverge because the training is likely to overshoot a minima. On the other hand, too small a learning rate slows down convergence. Typical values range from 1e-4 to 1e-1. Optional Valid values: 1e-6 ≤ float ≤ 10.0 Default value: 0.001 |
mini_batch_size |
The number of examples in each mini batch. The training
procedure processes data in mini batches. The optimal value depends
on the number of unique account identifiers in the dataset. In
general, the larger the Optional Valid values: 1 ≤ positive integer ≤ 500000 Default value: 10,000 |
num_ip_encoder_layers |
The number of fully connected layers used to encode the IP address embedding. The larger the number of layers, the greater the model's capacity to capture patterns among IP addresses. However, using a large number of layers increases the chance of overfitting. Optional Valid values: 0 ≤ positive integer ≤ 100 Default value: 1 |
random_negative_sampling_rate |
The number of random negative samples,
R,
to generate per input example. The training procedure relies on
negative samples to prevent the vector representations of the model
collapsing to a single point. Random negative sampling generates R
random IP addresses for each input account in the mini batch. The
sum of the Optional Valid values: 0 ≤ positive integer ≤ 500 Default value: 1 |
shuffled_negative_sampling_rate |
The number of shuffled negative samples,
S,
to generate per input example. In some cases, it helps to use more
realistic negative samples that are randomly picked from the
training data itself. This kind of negative sampling is achieved by
shuffling the data within a mini batch. Shuffled negative sampling
generates S negative IP addresses by shuffling the IP address and
account pairings within a mini batch. The sum of the
Optional Valid values: 0 ≤ positive integer ≤ 500 Default value: 1 |
weight_decay |
The weight decay coefficient. This parameter adds an L2 regularization factor that is required to prevent the model from overfitting the training data. Optional Valid values: 0.0 ≤ float ≤ 10.0 Default value: 0.00001 |