TimeSeriesPredictor.evaluate¶

Evaluate the forecast accuracy for given dataset.

This method measures the forecast accuracy using the last self.prediction_length time steps of each time series in data as a hold-out set.

Note

Metrics are always reported in ‘higher is better’ format. This means that metrics such as MASE or MAPE will be multiplied by -1, so their values will be negative. This is necessary to avoid the user needing to know the metric to understand if higher is better when looking at the evaluation results.

Parameters:

data (Union[TimeSeriesDataFrame, pd.DataFrame, Path, str]) –
The data to evaluate the best model on. If a cutoff is not provided, the last prediction_length time steps of each time series in data will be held out for prediction and forecast accuracy will be calculated on these time steps. When a cutoff is provided, the -cutoff-th to the -cutoff + prediction_length-th time steps of each time series are used for evaluation.

Must include both historical and future data (i.e., length of all time series in data must be at least prediction_length + 1, if cutoff is not provided, -cutoff + 1 otherwise).

The names and dtypes of columns and static features in data must match the train_data used to train the predictor.

If provided data is a pandas.DataFrame, AutoGluon will attempt to convert it to a TimeSeriesDataFrame. If a str or a Path is provided, AutoGluon will attempt to load this file.
model (str, optional) – Name of the model that you would like to evaluate. By default, the best model during training (with highest validation score) will be used.
metrics (str, TimeSeriesScorer or list[Union[str, TimeSeriesScorer]], optional) – Metric or a list of metrics to compute scores with. Defaults to self.eval_metric. Supports both metric names as strings and custom metrics based on TimeSeriesScorer.
cutoff (int, optional) – A negative integer less than or equal to -1 * prediction_length denoting the time step in data where the forecast evaluation starts, i.e., time series are evaluated from the -cutoff-th to the -cutoff + prediction_length-th time step. Defaults to -1 * prediction_length, using the last prediction_length time steps of each time series for evaluation.
display (bool, default = False) – If True, the scores will be printed.
use_cache (bool, default = True) – If True, will attempt to use the cached predictions. If False, cached predictions will be ignored. This argument is ignored if cache_predictions was set to False when creating the TimeSeriesPredictor.

Returns:

scores_dict – Dictionary where keys = metrics, values = performance along each metric. For consistency, error metrics will have their signs flipped to obey this convention. For example, negative MAPE values will be reported. To get the eval_metric score, do output[predictor.eval_metric.name].

Return type:

dict[str, float]