TimeSeriesPredictor.fit#
- TimeSeriesPredictor.fit(train_data: Union[TimeSeriesDataFrame, DataFrame], tuning_data: Optional[Union[TimeSeriesDataFrame, DataFrame]] = None, time_limit: Optional[int] = None, presets: Optional[str] = None, hyperparameters: Dict[Union[str, Type], Any] = None, hyperparameter_tune_kwargs: Optional[Union[str, Dict]] = None, excluded_model_types: Optional[List[str]] = None, num_val_windows: int = 1, refit_full: bool = False, enable_ensemble: bool = True, random_seed: Optional[int] = None, verbosity: Optional[int] = None) TimeSeriesPredictor [source]#
Fit probabilistic forecasting models to the given time series dataset.
- Parameters
train_data (Union[TimeSeriesDataFrame, pd.DataFrame]) –
Training data in the
TimeSeriesDataFrame
format. For best performance, all time series should have length> 2 * prediction_length
.If
known_covariates_names
were specified when creating the predictor,train_data
must include the columns listed inknown_covariates_names
with the covariates values aligned with the target time series. The known covariates must have a numeric (float or integer) dtype.Columns of
train_data
excepttarget
and those listed inknown_covariates_names
will be interpreted aspast_covariates
- covariates that are known only in the past.If
train_data
has static features (i.e.,train_data.static_features
is a pandas DataFrame), the predictor will interpret columns withint
andfloat
dtypes as continuous (real-valued) features, columns withobject
andstr
dtypes as categorical features, and will ignore the rest of columns.For example, to ensure that column “store_id” with dtype
int
is interpreted as a category, we need to change its type tocategory
:data.static_features["store_id"] = data.static_features["store_id"].astype("category")
If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a
TimeSeriesDataFrame
.tuning_data (Union[TimeSeriesDataFrame, pd.DataFrame], optional) –
Data reserved for model selection and hyperparameter tuning, rather than training individual models. Also used to compute the validation scores. Note that only the last
prediction_length
time steps of each time series are used for computing the validation score.If
tuning_data
is provided, multi-window backtesting on training data will be disabled, thenum_val_windows
argument will be ignored, andrefit_full
will be set toFalse
.Leaving this argument empty and letting AutoGluon automatically generate the validation set from
train_data
is a good default.If
known_covariates_names
were specified when creating the predictor,tuning_data
must also include the columns listed inknown_covariates_names
with the covariates values aligned with the target time series.If
train_data
has past covariates or static features,tuning_data
must have also include them (with same columns names and dtypes).If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a
TimeSeriesDataFrame
.time_limit (int, optional) – Approximately how long
fit()
will run (wall-clock time in seconds). If not specified,fit()
will run until all models have completed training.presets (str, optional) –
Optional preset configurations for various arguments in
fit()
.Can significantly impact predictive accuracy, memory footprint, inference latency of trained models, and various other properties of the returned predictor. It is recommended to specify presets and avoid specifying most other
fit()
arguments or model hyperparameters prior to becoming familiar with AutoGluon. For example, setpresets="high_quality"
to get a high-accuracy predictor, or setpresets="fast_training"
to quickly fit multiple simple statistical models. Any user-specified arguments infit()
will override the values used by presets.Available presets:
"fast_training"
: fit simple statistical models (ETS
,Theta
,Naive
,SeasonalNaive
) + fast tree-based modelRecursiveTabular
. These models are fast to train but may not be very accurate."medium_quality"
: all models mentioned above + deep learning modelDeepAR
. Default setting that produces good forecasts with reasonable training time."high_quality"
: all models mentioned above + automatically tuned statistical models (AutoETS
,AutoARIMA
) + tree-based modelDirectTabular
+ deep learning modelsTemporalFusionTransformer
andPatchTST
. Much more accurate thanmedium_quality
, but takes longer to train."best_quality"
: all models mentioned above + more tabular models + training multiple copies ofDeepAR
. Usually better thanhigh_quality
, but takes even longer to train.
Details for these presets can be found in
autogluon/timeseries/configs/presets_configs.py
. If not provided, user-provided values forhyperparameters
andhyperparameter_tune_kwargs
will be used (defaulting to their default values specified below).hyperparameters (str or dict, default = "medium_quality") –
Determines what models are trained and what hyperparameters are used by each model.
If str is passed, will use a preset hyperparameter configuration defined in` autogluon/timeseries/trainer/models/presets.py`.
If dict is provided, the keys are strings or types that indicate which models to train. Each value is itself a dict containing hyperparameters for each of the trained models, or a list of such dicts. Any omitted hyperparameters not specified here will be set to default. For example:
predictor.fit( ... hyperparameters={ "DeepAR": {}, "ETS": [ {"seasonal": "add"}, {"seasonal": None}, ], } )
The above example will train three models:
DeepAR
with default hyperparametersETS
with additive seasonality (all other parameters set to their defaults)ETS
with seasonality disabled (all other parameters set to their defaults)
Full list of available models and their hyperparameters is provided in forecasting_zoo.
The hyperparameters for each model can be fixed values (as shown above), or search spaces over which hyperparameter optimization is performed. A search space should only be provided when
hyperparameter_tune_kwargs
is given (i.e., hyperparameter-tuning is utilized). For example:from autogluon.common import space predictor.fit( ... hyperparameters={ "DeepAR": { "hidden_size": space.Int(20, 100), "dropout_rate": space.Categorical(0.1, 0.3), }, }, hyperparameter_tune_kwargs="auto", )
In the above example, multiple versions of the DeepAR model with different values of the parameters “hidden_size” and “dropout_rate” will be trained.
hyperparameter_tune_kwargs (str or dict, optional) –
Hyperparameter tuning strategy and kwargs (for example, how many HPO trials to run). If
None
, then hyperparameter tuning will not be performed.Ray Tune backend is used to tune deep-learning forecasting models from GluonTS implemented in MXNet. All other models use a custom HPO backed based on random search.
Can be set to a string to choose one of available presets:
"random"
: 10 trials of random search"auto"
: 10 trials of bayesian optimization GluonTS MXNet models, 10 trials of random search for other models
Alternatively, a dict can be passed for more fine-grained control. The dict must include the following keys
"num_trials"
: int, number of configurations to train for each tuned model"searcher"
: one of"random"
(random search),"bayes"
(bayesian optimization for GluonTS MXNet models, random search for other models) and"auto"
(same as"bayes"
)."scheduler"
: the only supported option is"local"
(all models trained on the same machine)
Example:
predictor.fit( ... hyperparameter_tune_kwargs={ "scheduler": "local", "searcher": "auto", "num_trials": 5, }, )
excluded_model_types (List[str], optional) –
Banned subset of model types to avoid training during
fit()
, even if present inhyperparameters
. For example, the following code will train all models included in thehigh_quality
presets exceptDeepAR
:predictor.fit( ..., presets="high_quality", excluded_model_types=["DeepAR"], )
num_val_windows (int, default = 1) – Number of backtests done on
train_data
for each trained model to estimate the validation performance. A separate copy of each model is trained for each validation window. Whennum_val_windows = k
, training time is increased roughly by a factor ofk
.refit_full (bool, default = False) – If True, after training is complete, AutoGluon will attempt to re-train all models using all of training data (including the data initially reserved for validation). This argument has no effect if
tuning_data
is provided.enable_ensemble (bool, default = True) – If True, the
TimeSeriesPredictor
will fit a simple weighted ensemble on top of the models specified viahyperparameters
.random_seed (int, optional) – If provided, fixes the seed of the random number generator for all models. This guarantees reproducible results for most models (except those trained on GPU because of the non-determinism of GPU operations).
verbosity (int, optional) – If provided, overrides the
verbosity
value used when creating theTimeSeriesPredictor
. See documentation forTimeSeriesPredictor
for more details.