Forecasting Time Series - Ensemble Models¶

Note

This documentation is intended for advanced users and may not be comprehensive.

For a stable public API, refer to the documentation for TimeSeriesPredictor.

This page contains the list of time series ensemble models available in AutoGluon. These models combine predictions from multiple base forecasting models to improve accuracy.

The available hyperparameters for each model are listed under Parameters.

The model names in the ensemble_hyperparameters dictionary don’t have to include the "Ensemble" suffix (e.g., both "SimpleAverage" and "SimpleAverageEnsemble" correspond to SimpleAverageEnsemble).

How ensembling works¶

Ensemble models combine predictions from multiple base forecasting models to produce a final forecast. The ensemble is trained on held-out validation data (backtest windows) to learn how to best combine the base model predictions.

By default, AutoGluon uses a single GreedyEnsemble that learns optimal weights for each base model. You can configure which ensemble models to use via the ensemble_hyperparameters argument in TimeSeriesPredictor.fit().

Multi-layer stacking¶

Multi-layer stacking extends the basic ensembling approach by training ensembles in multiple stages. Each layer of ensembles is trained on different backtest windows, and uses predictions from the previous layer as its inputs.

For example, with num_val_windows=(3, 2) and two ensemble layers:

Time series:  [...history...][window 1][window 2][window 3][window 4][window 5]
                             └───────── Layer 2 ──────────┘└───── Layer 3 ─────┘

Base models generate predictions for all 5 backtest windows
Layer 2 ensembles are trained on windows 1-3, learning to combine base model predictions
Layer 3 ensembles are trained on windows 4-5, using Layer 1 ensemble predictions as inputs
Final validation scores are computed on windows 4-5

This approach allows later ensemble layers to correct errors made by earlier layers, often improving forecast accuracy.

To enable multi-layer stacking, pass a list of dicts to ensemble_hyperparameters and a matching tuple to num_val_windows:

After training the predictor, you can access the validation predictions & targets used to train the ensembles using backtest_targets() and backtest_predictions() methods.

Overview¶

`GreedyEnsemble`	Greedy ensemble selection algorithm that iteratively builds an ensemble by selecting models with replacement.
`LinearStackerEnsemble`	Linear stacking ensemble that learns optimal linear combination weights through gradient-based optimization.
`MedianEnsemble`	Robust ensemble that computes predictions as the element-wise median of base model mean and quantile forecasts, providing robustness to outlier predictions.
`PerItemGreedyEnsemble`	Per-item greedy ensemble that fits separate weighted ensembles for each individual time series.
`PerQuantileTabularEnsemble`	Tabular ensemble using separate AutoGluon-Tabular models for each quantile and mean forecast.
`SimpleAverageEnsemble`	Simple ensemble that assigns equal weights to all base models for uniform averaging.
`TabularEnsemble`	Tabular ensemble that uses a single AutoGluon-Tabular model to learn ensemble combinations.

Simple averages¶

Simple ensemble models that combine predictions using mean or median aggregation.

class autogluon.timeseries.models.ensemble.SimpleAverageEnsemble(name: str | None = None, **kwargs)[source]¶

Simple ensemble that assigns equal weights to all base models for uniform averaging.

This ensemble computes predictions as the arithmetic mean of all base model forecasts, giving each model equal influence. Simple averaging is robust and often performs well when base models have similar accuracy levels or when validation data is insufficient to reliably estimate performance differences.

class autogluon.timeseries.models.ensemble.MedianEnsemble(path: str | None = None, name: str | None = None, hyperparameters: dict[str, Any] | None = None, freq: str | None = None, prediction_length: int = 1, covariate_metadata: CovariateMetadata | None = None, target: str = 'target', quantile_levels: Sequence[float] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), eval_metric: str | TimeSeriesScorer | None = None)[source]¶

Robust ensemble that computes predictions as the element-wise median of base model mean and quantile forecasts, providing robustness to outlier predictions.

Parameters:

isotonization (str, default = "sort") – The isotonization method to use (i.e. the algorithm to prevent quantile non-crossing). Currently only “sort” is supported.
detect_and_ignore_failures (bool, default = True) – Whether to detect and ignore “failed models”, defined as models which have a loss that is larger than 10x the median loss of all the models. This can be very important for the regression-based ensembles, as moving the weight from such a “failed model” to zero can require a long training time.

Linear ensembles¶

Linear ensemble models that combine predictions using weighted averages or linear stacking.

class autogluon.timeseries.models.ensemble.GreedyEnsemble(name: str | None = None, **kwargs)[source]¶

Greedy ensemble selection algorithm that iteratively builds an ensemble by selecting models with replacement.

Also known as WeightedEnsemble for backward compatibility.

This class implements the Ensemble Selection algorithm by Caruana et al. [Car2004], which starts with an empty ensemble and repeatedly adds the model that most improves the ensemble’s validation performance. Models can be selected multiple times, allowing the algorithm to assign higher effective weights to better-performing models.

Parameters:: ensemble_size (int, default = 100) – Number of models (with replacement) to include in the ensemble.

References

[Car2004]

Caruana, Rich, et al. “Ensemble selection from libraries of models.” Proceedings of the twenty-first international conference on Machine learning. 2004.

class autogluon.timeseries.models.ensemble.PerItemGreedyEnsemble(name: str | None = None, **kwargs)[source]¶

Per-item greedy ensemble that fits separate weighted ensembles for each individual time series.

This ensemble applies the greedy Ensemble Selection algorithm by Caruana et al. [Car2004] independently to each time series in the dataset, allowing for customized model combinations that adapt to the specific characteristics of individual series. Each time series gets its own optimal ensemble weights based on predictions for that particular series. If items not seen during training are provided at prediction time, average model weight across the training items will be used for their predictions.

The per-item approach is particularly effective for datasets with heterogeneous time series that exhibit different patterns, seasonalities, or noise characteristics.

The algorithm uses parallel processing to efficiently fit ensembles across all time series.

Parameters:

ensemble_size (int, default = 100) – Number of models (with replacement) to include in the ensemble.
n_jobs (int or float, default = joblib.cpu_count(only_physical_cores=True)) – Number of CPU cores used to fit the ensembles in parallel.

References

[Car2004]

Caruana, Rich, et al. “Ensemble selection from libraries of models.” Proceedings of the twenty-first international conference on Machine learning. 2004.

class autogluon.timeseries.models.ensemble.LinearStackerEnsemble(path: str | None = None, name: str | None = None, hyperparameters: dict[str, Any] | None = None, freq: str | None = None, prediction_length: int = 1, covariate_metadata: CovariateMetadata | None = None, target: str = 'target', quantile_levels: Sequence[float] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), eval_metric: str | TimeSeriesScorer | None = None)[source]¶

Linear stacking ensemble that learns optimal linear combination weights through gradient-based optimization.

Weighted combinations can be per model or per model-quantile, model-horizon, model-quantile-horizon combinations. These choices are controlled by the weights_per hyperparameter.

The optimization process uses gradient descent with configurable learning rates and convergence criteria, allowing for flexible training dynamics. Weight pruning can be applied to remove models with negligible contributions, resulting in sparse and interpretable ensembles.

Parameters:

weights_per (str, default = "m") –
Granularity of weight learning.
- ”m”: single weight per model
- ”mq”: single weight for each model-quantile combination
- ”mt”: single weight for each model-time step where time steps run across the prediction horizon
- ”mtq”: single weight for each model-quantile-time step combination
lr (float, default = 0.1) – Learning rate for PyTorch optimizer during weight training.
max_epochs (int, default = 10000) – Maximum number of training epochs for weight optimization.
relative_tolerance (float, default = 1e-7) – Relative tolerance for convergence detection during training.
prune_below (float, default = 0.0) – Threshold below which weights are pruned to zero for sparsity. The weights are redistributed across remaining models after pruning.
isotonization (str, default = "sort") – The isotonization method to use (i.e. the algorithm to prevent quantile non-crossing). Currently only “sort” is supported.
detect_and_ignore_failures (bool, default = True) – Whether to detect and ignore “failed models”, defined as models which have a loss that is larger than 10x the median loss of all the models. This can be very important for the regression-based ensembles, as moving the weight from such a “failed model” to zero can require a long training time.

Nonlinear ensembles¶

Nonlinear ensemble models that use tabular models to combine predictions from base forecasters.

class autogluon.timeseries.models.ensemble.TabularEnsemble(path: str | None = None, name: str | None = None, hyperparameters: dict[str, Any] | None = None, freq: str | None = None, prediction_length: int = 1, covariate_metadata: CovariateMetadata | None = None, target: str = 'target', quantile_levels: Sequence[float] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), eval_metric: str | TimeSeriesScorer | None = None)[source]¶

Tabular ensemble that uses a single AutoGluon-Tabular model to learn ensemble combinations.

This ensemble trains a single tabular model (such as gradient boosting machines) to predict all quantiles simultaneously from base model predictions. The tabular model learns complex non-linear patterns in how base models should be combined, potentially capturing interactions and conditional dependencies that simple weighted averages cannot represent.

Parameters:

model_name (str, default = "CAT") – Name of the AutoGluon-Tabular model to use for ensemble learning. Model name should be registered in AutoGluon-Tabular model registry.
model_hyperparameters (dict, default = {}) – Hyperparameters to pass to the underlying AutoGluon-Tabular model.
isotonization (str, default = "sort") – The isotonization method to use (i.e. the algorithm to prevent quantile non-crossing). Currently only “sort” is supported.
detect_and_ignore_failures (bool, default = True) – Whether to detect and ignore “failed models”, defined as models which have a loss that is larger than 10x the median loss of all the models. This can be very important for the regression-based ensembles, as moving the weight from such a “failed model” to zero can require a long training time.

class autogluon.timeseries.models.ensemble.PerQuantileTabularEnsemble(path: str | None = None, name: str | None = None, hyperparameters: dict[str, Any] | None = None, freq: str | None = None, prediction_length: int = 1, covariate_metadata: CovariateMetadata | None = None, target: str = 'target', quantile_levels: Sequence[float] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), eval_metric: str | TimeSeriesScorer | None = None)[source]¶

Tabular ensemble using separate AutoGluon-Tabular models for each quantile and mean forecast.

This ensemble trains dedicated tabular models for each quantile level plus a separate model for the mean prediction. Each model specializes in learning optimal combinations for its specific target, allowing for quantile-specific ensemble strategies that can capture different model behaviors across the prediction distribution.

Parameters:

model_name (str, default = "GBM") – Name of the AutoGluon-Tabular model to use for ensemble learning. Model name should be registered in AutoGluon-Tabular model registry.
model_hyperparameters (dict, default = {}) – Hyperparameters to pass to the underlying AutoGluon-Tabular model.
isotonization (str, default = "sort") – The isotonization method to use (i.e. the algorithm to prevent quantile non-crossing). Currently only “sort” is supported.
detect_and_ignore_failures (bool, default = True) – Whether to detect and ignore “failed models”, defined as models which have a loss that is larger than 10x the median loss of all the models. This can be very important for the regression-based ensembles, as moving the weight from such a “failed model” to zero can require a long training time.