Forecasting Time Series - Model Zoo¶
Note
This documentation is intended for advanced users and may not be comprehensive.
For a stable public API, refer to TimeSeriesPredictor.
This page contains the list of time series forecasting models available in AutoGluon. The available hyperparameters for each model are listed under Other Parameters.
This list is useful if you want to override the default hyperparameters (Manually configuring models)
or define custom hyperparameter search spaces (Hyperparameter tuning), as described in the In-depth Tutorial.
For example, the following code will train a TimeSeriesPredictor
with DeepAR
and ETS
models with default hyperparameters (and a weighted ensemble on top of them):
predictor = TimeSeriesPredictor().fit(
train_data,
hyperparameters={
"DeepAR": {},
"ETS": {},
},
)
Note that we don’t include the Model
suffix when specifying the model name in hyperparameters
(e.g., the class DeepARModel
corresponds to the name "DeepAR"
in the hyperparameters
dictionary).
Also note that some of the models’ hyperparameters have names and default values that are different from the original libraries.
Default models¶
Baseline model that sets the forecast equal to the last observed value. |
|
Baseline model that sets the forecast equal to the last observed value from the same season. |
|
Autoregressive Integrated Moving Average (ARIMA) model. |
|
Exponential smoothing with trend and seasonality. |
|
The Theta forecasting model of Assimakopoulos and Nikolopoulos (2000). |
|
Predict future time series values using autogluon.tabular.TabularPredictor. |
|
DeepAR model from GluonTS based on the PyTorch backend. |
|
SimpleFeedForward model from GluonTS based on the PyTorch backend. |
NaiveModel¶
-
class
autogluon.timeseries.models.
NaiveModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Baseline model that sets the forecast equal to the last observed value.
Quantiles are obtained by assuming that the residuals follow zero-mean normal distribution, scale of which is estimated from the empirical distribution of the residuals. As described in https://otexts.com/fpp3/prediction-intervals.html
SeasonalNaiveModel¶
-
class
autogluon.timeseries.models.
SeasonalNaiveModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Baseline model that sets the forecast equal to the last observed value from the same season.
Quantiles are obtained by assuming that the residuals follow zero-mean normal distribution, scale of which is estimated from the empirical distribution of the residuals. As described in https://otexts.com/fpp3/prediction-intervals.html
- Other Parameters
- seasonal_periodint or None, default = None
Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, will fall back to Naive forecast. Seasonality will also be disabled, if the length of the time series is < seasonal_period.
ARIMAModel¶
-
class
autogluon.timeseries.models.
ARIMAModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Autoregressive Integrated Moving Average (ARIMA) model.
Based on statsmodels.tsa.statespace.sarimax.SARIMAX.
Our implementation contains several improvements over the Statsmodels version, such as multi-CPU training and reducing the disk usage when saving models.
- Other Parameters
- order: Tuple[int, int, int], default = (1, 1, 1)
The (p, d, q) order of the model for the number of AR parameters, differences, and MA parameters to use.
- seasonal_order: Tuple[int, int, int], default = (0, 0, 0)
The (P, D, Q) parameters of the seasonal ARIMA model. Setting to (0, 0, 0) disables seasonality.
- seasonal_periodint or None, default = None
Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled.
- trend{“n”, “c”, “t”, “ct”}, default = “c”
Parameter controlling the trend polynomial. Allowed values are “n” (no trend), “c” (constant), “t” (linear) and “ct” (constant plus linear).
- enforce_stationaritybool, default = True
Whether to transform the AR parameters to enforce stationarity in the autoregressive component of the model. If ARIMA crashes during fitting with an LU decomposition error, you can either set enforce_stationarity to False or increase the differencing parameter
d
inorder
.- enforce_invertibilitybool, default = True
Whether to transform the MA parameters to enforce invertibility in the moving average component of the model.
- maxiterint, default = 50
Number of iterations during optimization.
- n_jobsint or float, default = 0.5
Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
ETSModel¶
-
class
autogluon.timeseries.models.
ETSModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Exponential smoothing with trend and seasonality.
Based on statsmodels.tsa.exponential_smoothing.ets.ETSModel.
Our implementation contains several improvements over the Statsmodels version, such as multi-CPU training and reducing the disk usage when saving models.
- Other Parameters
- error{“add”, “mul”}, default = “add”
Error model. Allowed values are “add” (additive) and “mul” (multiplicative). Note that “mul” is only applicable to time series with positive values.
- trend{“add”, “mul”, None}, default = “add”
Trend component model. Allowed values are “add” (additive), “mul” (multiplicative) and None (disabled). Note that “mul” is only applicable to time series with positive values.
- damped_trendbool, default = False
Whether or not the included trend component is damped.
- seasonal{“add”, “mul”, None}, default = “add”
Seasonal component model. Allowed values are “add” (additive), “mul” (multiplicative) and None (disabled). Note that “mul” is only applicable to time series with positive values.
- seasonal_periodint or None, default = None
Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled. Seasonality will also be disabled, if the length of the time series is < 2 * seasonal_period.
- maxiterint, default = 1000
Number of iterations during optimization.
- n_jobsint or float, default = 0.5
Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
ThetaModel¶
-
class
autogluon.timeseries.models.
ThetaModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ The Theta forecasting model of Assimakopoulos and Nikolopoulos (2000).
Based on statsmodels.tsa.forecasting.theta.ThetaModel.
Our implementation contains several improvements over the Statsmodels version, such as multi-CPU training and reducing the disk usage when saving models.
- Other Parameters
- deseasonalizebool, default = True
Whether to deseasonalize the data. If True and use_test is True, the data is only deseasonalized if the null hypothesis of no seasonality is rejected.
- seasonal_periodint or None, default = None
Number of time steps in a complete seasonal cycle for seasonal models. For example, 7 for daily data with a weekly cycle or 12 for monthly data with an annual cycle. When set to None, seasonal_period will be inferred from the frequency of the training data. Can also be specified manually by providing an integer > 1. If seasonal_period (inferred or provided) is equal to 1, seasonality will be disabled. Seasonality will also be disabled, if the length of the time series is < 2 * seasonal_period.
- use_testbool, default = True
Whether to use a statistical test for determining if the seasonality is present.
- method{“auto”, “additive”, “multiplicative”}, default = “auto”
The model used for the seasonal decomposition. “auto” uses multiplicative if the time series is non-negative and all estimated seasonal components are positive. If either of these conditions is False, then it uses an additive decomposition.
- differencebool, default = False
Whether to difference the data before testing for seasonality.
- n_jobsint or float, default = 0.5
Number of CPU cores used to fit the models in parallel. When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used. When set to a positive integer, that many cores are used. When set to -1, all CPU cores are used.
References
Assimakopoulos, Vassilis, and Konstantinos Nikolopoulos. “The theta model: a decomposition approach to forecasting.” International journal of forecasting 16.4 (2000): 521-530.
AutoGluonTabularModel¶
-
class
autogluon.timeseries.models.
AutoGluonTabularModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Predict future time series values using autogluon.tabular.TabularPredictor.
The forecasting is converted to a tabular problem using the following features:
lag features (observed time series values) based on
freq
of the datatime features (e.g., day of the week) based on the timestamp of the measurement
static features of each item (if available)
Quantiles are obtained by assuming that the residuals follow zero-mean normal distribution, scale of which is estimated from the empirical distribution of the residuals.
- Other Parameters
- max_train_sizeint, default = 1_000_000
Maximum number of rows in the training and validation sets. If the number of rows in train or validation data exceeds
max_train_size
, thenmax_train_size
many rows are subsampled from the dataframe.- tabular_hyperparmetersDict[Dict[str, Any]], optional
Hyperparameters dictionary passed to TabularPredictor.fit. Contains the names of models that should be fit. Defaults to
{"XGB": {}, "CAT": {}, "GBM" :{}}
.
DeepARModel¶
-
class
autogluon.timeseries.models.
DeepARModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ DeepAR model from GluonTS based on the PyTorch backend.
The model consists of an LSTM encoder and a decoder that outputs the distribution of the next target value. Close to the model described in [Salinas2020].
Based on gluonts.torch.model.deepar.DeepAREstimator.
- Other Parameters
- context_lengthint, optional
Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)
- disable_static_featuresbool, default = False
If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
- disable_known_covariatesbool, default = False
If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
- num_layersint, default = 2
Number of RNN layers
- hidden_sizeint, default = 40
Number of RNN cells for each layer
- dropout_ratefloat, default = 0.1
Dropout regularization parameter
- embedding_dimensionint, optional
Dimension of the embeddings for categorical features (if None, defaults to [min(50, (cat+1)//2) for cat in cardinality])
- distr_outputgluonts.torch.distributions.DistributionOutput, default = StudentTOutput()
Distribution to use to evaluate observations and sample predictions
- scaling: bool, default = True
Whether to automatically scale the target values
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Salinas2020
Salinas, David, et al. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting. 2020.
SimpleFeedForwardModel¶
-
class
autogluon.timeseries.models.
SimpleFeedForwardModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ SimpleFeedForward model from GluonTS based on the PyTorch backend.
The model consists of a multilayer perceptron (MLP) that predicts the distribution of all the target value in the forecast horizon.
Based on gluonts.torch.model.simple_feedforward.SimpleFeedForwardEstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint, optional
Number of time units that condition the predictions (default: None, in which case context_length = prediction_length)
- hidden_dimensions: List[int], default = [20, 20]
Size of hidden layers in the feedforward network
- distr_outputgluonts.torch.distributions.DistributionOutput, default = NormalOutput()
Distribution to fit.
- batch_normalizationbool, default = False
Whether to use batch normalization
- mean_scalingbool, default = True
Scale the network input by the data mean and the network output by its inverse
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
MXNet Models¶
Using the models listed below requires installing Apache MXNet v1.9. This can be done as follows:
python -m pip install mxnet~=1.9
If you want to use a GPU, install the version of MXNet that matches your CUDA version. See the MXNet documentation for more info.
If a GPU is available and MXNet version with CUDA is installed, all the MXNet models will be trained using the GPU. Otherwise, the models will be trained on CPU.
DeepAR model from GluonTS based on the MXNet backend. |
|
MQCNN model from GluonTS. |
|
MQRNN model from GluonTS. |
|
SimpleFeedForward model from GluonTS based on the MXNet backend. |
|
TemporalFusionTransformer model from GluonTS. |
|
Autoregressive transformer forecasting model from GluonTS. |
DeepARMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
DeepARMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ DeepAR model from GluonTS based on the MXNet backend.
The model consists of an RNN encoder (LSTM or GRU) and a decoder that outputs the distribution of the next target value. Close to the model described in [Salinas2020].
Based on gluonts.mx.model.deepar.DeepAREstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint, optional
Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)
- disable_static_featuresbool, default = False
If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
- disable_known_covariatesbool, default = False
If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
- num_layersint, default = 2
Number of RNN layers
- num_cellsint, default = 40
Number of RNN cells for each layer
- cell_typestr, default = “lstm”
Type of recurrent cells to use (available: ‘lstm’ or ‘gru’)
- dropoutcell_typestr, default = ‘ZoneoutCell’
Type of dropout cells to use (available: ‘ZoneoutCell’, ‘RNNZoneoutCell’, ‘VariationalDropoutCell’ or ‘VariationalZoneoutCell’)
- dropout_ratefloat, default = 0.1
Dropout regularization parameter
- embedding_dimensionint, optional
Dimension of the embeddings for categorical features (if None, defaults to [min(50, (cat+1)//2) for cat in cardinality])
- distr_outputgluonts.mx.DistributionOutput, default = StudentTOutput()
Distribution to use to evaluate observations and sample predictions
- scaling: bool, default = True
Whether to automatically scale the target values
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Salinas2020
Salinas, David, et al. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting. 2020.
MQCNNMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
MQCNNMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ MQCNN model from GluonTS.
The model consists of a CNN encoder and a decoder that directly predicts the quantiles of the future target values’ distribution. As described in [Wen2017].
Based on gluonts.mx.model.seq2seq.MQCNNEstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint, optional
Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)
- disable_static_featuresbool, default = False
If True, static features won’t be used by the model even if they are present in the dataset. If False, static features will be used by the model if they are present in the dataset.
- disable_known_covariatesbool, default = False
If True, known covariates won’t be used by the model even if they are present in the dataset. If False, known covariates will be used by the model if they are present in the dataset.
- embedding_dimensionint, optional
Dimension of the embeddings for categorical features. (default: [min(50, (cat+1)//2) for cat in cardinality])
- add_time_featurebool, default = True
Adds a set of time features.
- add_age_featurebool, default = False
Adds an age feature. The age feature starts with a small value at the start of the time series and grows over time.
- decoder_mlp_dim_seqList[int], default = [30]
The dimensionalities of the Multi Layer Perceptron layers of the decoder.
- channels_seqList[int], default = [30, 30, 30]
The number of channels (i.e. filters or convolutions) for each layer of the HierarchicalCausalConv1DEncoder. More channels usually correspond to better performance and larger network size.
- dilation_seqList[int], default = [1, 3, 5]
The dilation of the convolutions in each layer of the HierarchicalCausalConv1DEncoder. Greater numbers correspond to a greater receptive field of the network, which is usually better with longer context_length. (Same length as channels_seq)
- kernel_size_seqList[int], default = [7, 3, 3]
The kernel sizes (i.e. window size) of the convolutions in each layer of the HierarchicalCausalConv1DEncoder. (Same length as channels_seq)
- use_residualbool, default = True
Whether the hierarchical encoder should additionally pass the unaltered past target to the decoder.
- quantilesList[float], default = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
The list of quantiles that will be optimized for, and predicted by, the model. Optimizing for more quantiles than are of direct interest to you can result in improved performance due to a regularizing effect.
- distr_outputgluonts.mx.DistributionOutput, optional
DistributionOutput to use. Only one between quantile and distr_output can be set.
- scalingbool, optional
Whether to automatically scale the target values. (default: False if quantile_output is used, True otherwise)
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Wen2017
Wen, Ruofeng, et al. “A multi-horizon quantile recurrent forecaster.” arXiv preprint arXiv:1711.11053 (2017)
MQRNNMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
MQRNNMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ MQRNN model from GluonTS.
The model consists of an RNN encoder and a decoder that directly predicts the quantiles of the future target values’ distribution. As described in [Wen2017].
Based on gluonts.mx.model.seq2seq.MQRNNEstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint, optional
Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)
- embedding_dimensionint, optional
Dimension of the embeddings for categorical features. (default: [min(50, (cat+1)//2) for cat in cardinality])
- decoder_mlp_dim_seqList[int], default = [30]
The dimensionalities of the Multi Layer Perceptron layers of the decoder.
- quantilesList[float], default = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
The list of quantiles that will be optimized for, and predicted by, the model. Optimizing for more quantiles than are of direct interest to you can result in improved performance due to a regularizing effect.
- distr_outputgluonts.mx.DistributionOutput, optional
DistributionOutput to use. Only one between quantile and distr_output can be set.
- scalingbool, optional
Whether to automatically scale the target values. (default: False if quantile_output is used, True otherwise)
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Wen2017
Wen, Ruofeng, et al. “A multi-horizon quantile recurrent forecaster.” arXiv preprint arXiv:1711.11053 (2017)
SimpleFeedForwardMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
SimpleFeedForwardMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ SimpleFeedForward model from GluonTS based on the MXNet backend.
The model consists of a multilayer perceptron (MLP) that predicts the distribution of the next target value.
Based on gluonts.mx.model.simple_feedforward.SimpleFeedForwardEstimator. See GluonTS documentation for additional hyperparameters.
Note that AutoGluon uses hyperparameters
hidden_dim
andnum_layers
instead ofnum_hidden_dimensions
used in GluonTS. This is done to ensure compatibility with Ray Tune.- Other Parameters
- context_lengthint, optional
Number of time units that condition the predictions (default: None, in which case context_length = prediction_length)
- hidden_dim: int, default = 40
Number of hidden units in each layer of the MLP
- num_layersint, default = 2
Number of hidden layers in the MLP
- distr_outputgluonts.mx.DistributionOutput, default = StudentTOutput()
Distribution to fit
- batch_normalizationbool, default = False
Whether to use batch normalization
- mean_scalingbool, default = True
Scale the network input by the data mean and the network output by its inverse
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
TemporalFusionTransformerMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
TemporalFusionTransformerMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ TemporalFusionTransformer model from GluonTS.
The model combines an LSTM encoder, a transformer decoder, and directly predicts the quantiles of future target values. As described in [Lim2021].
Based on gluonts.mx.model.tft.TemporalFusionTransformerEstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint or None, default = None
Number of past values used for prediction. (default: None, in which case context_length = prediction_length)
- hidden_dimint, default = 32
Size of the hidden layer.
- num_headsint, default = 4
Number of attention heads in multi-head attention.
- dropout_ratefloat, default = 0.1
Dropout regularization parameter
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Lim2021
Lim, Bryan, et al. “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting.” International Journal of Forecasting. 2021.
TransformerMXNetModel¶
-
class
autogluon.timeseries.models.gluonts.mx.
TransformerMXNetModel
(freq: Optional[str] = None, prediction_length: int = 1, path: Optional[str] = None, name: Optional[str] = None, eval_metric: str = None, hyperparameters: Dict[str, Any] = None, **kwargs)[source]¶ Autoregressive transformer forecasting model from GluonTS.
The model consists of an Transformer encoder and a decoder that outputs the distribution of the next target value. The transformer architecture is close to the one described in [Vaswani2017].
Based on gluonts.mx.model.transformer.TransformerEstimator. See GluonTS documentation for additional hyperparameters.
- Other Parameters
- context_lengthint, optional
Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)
- model_dimint, default = 32
Dimension of the transformer network, i.e., embedding dimension of the input
- dropout_ratefloat, default = 0.1
Dropout regularization parameter
- distr_outputgluonts.mx.DistributionOutput, default = StudentTOutput()
Distribution to use to evaluate observations and sample predictions
- inner_ff_dim_scaleint, default = 4
Dimension scale of the inner hidden layer of the transformer’s feedforward network
- pre_seqstr, default = “dn”
Sequence that defined operations of the processing block before the main transformer network. Available operations: ‘d’ for dropout, ‘r’ for residual connections and ‘n’ for normalization
- post_seqstr, default = “drn”
Sequence that defined operations of the processing block in and after the main transformer network. Available operations: ‘d’ for dropout, ‘r’ for residual connections and ‘n’ for normalization
- epochsint, default = 100
Number of epochs the model will be trained for
- batch_sizeint, default = 64
Size of batches used during training
- num_batches_per_epochint, default = 50
Number of batches processed every epoch
- learning_ratefloat, default = 1e-3,
Learning rate used during training
References
- Vaswani2017
Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.
Additional features¶
Overview of the additional features and covariates supported by different models. Models not included in this table currently do not support any additional features.
Model |
Static features (continuous) |
Static features (categorical) |
Known covariates (continuous) |
---|---|---|---|
✓ |
✓ |
||
✓ |
✓ |
✓ |
|
✓ |
✓ |
✓ |
|
✓ |
✓ |
✓ |
|
✓ |
✓ |