DeepAR Hyperparameters - Amazon SageMaker

DeepAR Hyperparameters

The following table lists the hyperparameters that you can set when training with the Amazon SageMaker DeepAR forecasting algorithm.

Parameter Name Description
context_length

The number of time-points that the model gets to see before making the prediction. The value for this parameter should be about the same as the prediction_length. The model also receives lagged inputs from the target, so context_length can be much smaller than typical seasonalities. For example, a daily time series can have yearly seasonality. The model automatically includes a lag of one year, so the context length can be shorter than a year. The lag values that the model picks depend on the frequency of the time series. For example, lag values for daily frequency are previous week, 2 weeks, 3 weeks, 4 weeks, and year.

Required

Valid values: Positive integer

epochs

The maximum number of passes over the training data. The optimal value depends on your data size and learning rate. See also early_stopping_patience. Typical values range from 10 to 1000.

Required

Valid values: Positive integer

prediction_length

The number of time-steps that the model is trained to predict, also called the forecast horizon. The trained model always generates forecasts with this length. It can't generate longer forecasts. The prediction_length is fixed when a model is trained and it cannot be changed later.

Required

Valid values: Positive integer

time_freq

The granularity of the time series in the dataset. Use time_freq to select appropriate date features and lags. The model supports the following basic frequencies. It also supports multiples of these basic frequencies. For example, 5min specifies a frequency of 5 minutes.

  • M: monthly

  • W: weekly

  • D: daily

  • H: hourly

  • min: every minute

Required

Valid values: An integer followed by M, WDH, or min. For example, 5min.

cardinality

When using the categorical features (cat), cardinality is an array specifying the number of categories (groups) per categorical feature. Set this to auto to infer the cardinality from the data. The auto mode also works when no categorical features are used in the dataset. This is the recommended setting for the parameter.

Set cardinality to ignore to force DeepAR to not use categorical features, even it they are present in the data.

To perform additional data validation, it is possible to explicitly set this parameter to the actual value. For example, if two categorical features are provided where the first has 2 and the other has 3 possible values, set this to [2, 3].

For more information on how to use categorical feature, see the data-section on the main documentation page of DeepAR.

Optional

Valid values: auto, ignore, array of positive integers, empty string, or

Default value: auto

dropout_rate

The dropout rate to use during training. The model uses zoneout regularization. For each iteration, a random subset of hidden neurons are not updated. Typical values are less than 0.2.

Optional

Valid values: float

Default value: 0.1

early_stopping_patience

If this parameter is set, training stops when no progress is made within the specified number of epochs. The model that has the lowest loss is returned as the final model.

Optional

Valid values: integer

embedding_dimension

Size of embedding vector learned per categorical feature (same value is used for all categorical features).

The DeepAR model can learn group-level time series patterns when a categorical grouping feature is provided. To do this, the model learns an embedding vector of size embedding_dimension for each group, capturing the common properties of all time series in the group. A larger embedding_dimension allows the model to capture more complex patterns. However, because increasing the embedding_dimension increases the number of parameters in the model, more training data is required to accurately learn these parameters. Typical values for this parameter are between 10-100.

Optional

Valid values: positive integer

Default value: 10

learning_rate

The learning rate used in training. Typical values range from 1e-4 to 1e-1.

Optional

Valid values: float

Default value: 1e-3

likelihood

The model generates a probabilistic forecast, and can provide quantiles of the distribution and return samples. Depending on your data, select an appropriate likelihood (noise model) that is used for uncertainty estimates. The following likelihoods can be selected:

  • gaussian: Use for real-valued data.

  • beta: Use for real-valued targets between 0 and 1 inclusive.

  • negative-binomial: Use for count data (non-negative integers).

  • student-T: An alternative for real-valued data that works well for bursty data.

  • deterministic-L1: A loss function that does not estimate uncertainty and only learns a point forecast.

Optional

Valid values: One of gaussian, beta, negative-binomial, student-T, or deterministic-L1.

Default value: student-T

mini_batch_size

The size of mini-batches used during training. Typical values range from 32 to 512.

Optional

Valid values: positive integer

Default value: 128

num_cells

The number of cells to use in each hidden layer of the RNN. Typical values range from 30 to 100.

Optional

Valid values: positive integer

Default value: 40

num_dynamic_feat

The number of dynamic_feat provided in the data. Set this to auto to infer the number of dynamic features from the data. The auto mode also works when no dynamic features are used in the dataset. This is the recommended setting for the parameter.

To force DeepAR to not use dynamic features, even it they are present in the data, set num_dynamic_feat to ignore.

To perform additional data validation, it is possible to explicitly set this parameter to the actual integer value. For example, if two dynamic features are provided, set this to 2.

Optional

Valid values: auto, ignore, positive integer, or empty string

Default value: auto

num_eval_samples

The number of samples that are used per time-series when calculating test accuracy metrics. This parameter does not have any influence on the training or the final model. In particular, the model can be queried with a different number of samples. This parameter only affects the reported accuracy scores on the test channel after training. Smaller values result in faster evaluation, but then the evaluation scores are typically worse and more uncertain. When evaluating with higher quantiles, for example 0.95, it may be important to increase the number of evaluation samples.

Optional

Valid values: integer

Default value: 100

num_layers

The number of hidden layers in the RNN. Typical values range from 1 to 4.

Optional

Valid values: positive integer

Default value: 2

test_quantiles

Quantiles for which to calculate quantile loss on the test channel.

Optional

Valid values: array of floats

Default value: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]