TimeSeriesPredictor.feature_importance¶
- TimeSeriesPredictor.feature_importance(data: TimeSeriesDataFrame | DataFrame | Path | str | None = None, model: str | None = None, metric: str | TimeSeriesScorer | None = None, features: List[str] | None = None, time_limit: float | None = None, method: Literal['naive', 'permutation'] = 'permutation', subsample_size: int = 50, num_iterations: int | None = None, random_seed: int | None = 123, relative_scores: bool = False, include_confidence_band: bool = True, confidence_level: float = 0.99)[source]¶
Calculates feature importance scores for the given model via replacing each feature by a shuffled version of the same feature (also known as permutation feature importance) or by assigning a constant value representing the median or mode of the feature, and computing the relative decrease in the model’s predictive performance.
A feature’s importance score represents the performance drop that results when the model makes predictions on a perturbed copy of the data where this feature’s values have been randomly shuffled across rows. A feature score of 0.01 would indicate that the predictive performance dropped by 0.01 when the feature was randomly shuffled or replaced. The higher the score a feature has, the more important it is to the model’s performance.
If a feature has a negative score, this means that the feature is likely harmful to the final model, and a model trained with the feature removed would be expected to achieve a better predictive performance. Note that calculating feature importance can be a computationally expensive process, particularly if the model uses many features. In many cases, this can take longer than the original model training. Roughly, this will equal to the number of features in the data multiplied by
num_iterations
(or, 1 whenmethod="naive"
) and time taken whenevaluate()
is called on a dataset withsubsample_size
.- Parameters:
data (TimeSeriesDataFrame, pd.DataFrame, Path or str, optional) –
The data to evaluate feature importances on. The last
prediction_length
time steps of the data set, for each item, will be held out for prediction and forecast accuracy will be calculated on these time steps. More accurate feature importances will be obtained from new data that was held-out duringfit()
.If
known_covariates_names
were specified when creating the predictor,data
must include the columns listed inknown_covariates_names
with the covariates values aligned with the target time series. This data must contain the label column with the same column name as specified duringfit()
.If
train_data
used to train the predictor contained past covariates or static features, thendata
must also include them (with same column names and dtypes).If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a
TimeSeriesDataFrame
. If str or Path is passed,data
will be loaded using the str value as the file path.If
data
is not provided, then validation (tuning) data provided during training (or the held out data used for validation iftuning_data
was not explicitly providedfit()
) will be used.model (str, optional) – Name of the model that you would like to evaluate. By default, the best model during training (with highest validation score) will be used.
metric (str or TimeSeriesScorer, optional) – Metric to be used for computing feature importance. If None, the
eval_metric
specified during initialization of theTimeSeriesPredictor
will be used.features (List[str], optional) – List of feature names that feature importances are calculated for and returned. By default, all feature importances will be returned.
method ({"permutation", "naive"}, default = "permutation") –
Method to be used for computing feature importance.
naive
: computes feature importance by replacing the values of each feature by a constant value and computing feature importances as the relative improvement in the evaluation metric. The constant value is the median for real-valued features and the mode for categorical features, for both covariates and static features, obtained from the feature values indata
provided.permutation
: computes feature importance by naively shuffling the values of the feature across different items and time steps. Each feature is shuffled fornum_iterations
times and feature importances are computed as the relative improvement in the evaluation metric. Refer to https://explained.ai/rf-importance/ for an explanation of permutation importance.
subsample_size (int, default = 50) – The number of items to sample from data when computing feature importance. Larger values increase the accuracy of the feature importance scores. Runtime linearly scales with subsample_size.
time_limit (float, optional) – Time in seconds to limit the calculation of feature importance. If None, feature importance will calculate without early stopping. If
method="permutation"
, a minimum of 1 full shuffle set will always be evaluated. If a shuffle set evaluation takes longer thantime_limit
, the method will take the length of a shuffle set evaluation to return regardless of the time_limit.num_iterations (int, optional) – The number of different iterations of the data that are evaluated. If
method="permutation"
, this will be interpreted as the number of shuffle sets (equivalent tonum_shuffle_sets
inTabularPredictor.feature_importance()
). Ifmethod="naive"
, the constant replacement approach is repeated fornum_iterations
times, and a different subsample of data (of sizesubsample_size
) will be taken in each iteration. Default is 1 formethod="naive"
and 5 formethod="permutation"
. The value will be ignored ifmethod="naive"
and the subsample size is greater than the number of items indata
as additional iterations will be redundant. Larger values will increase the quality of the importance evaluation. It is generally recommended to increasesubsample_size
before increasingnum_iterations
. Runtime scales linearly withnum_iterations
.random_seed (int or None, default = 123) – If provided, fixes the seed of the random number generator for all models. This guarantees reproducible results for feature importance.
relative_scores (bool, default = False) – By default, this method will return expected average absolute improvement in the eval metric due to the feature. If True, then the statistics will be computed over the relative (percentage) improvements.
include_confidence_band (bool, default = True) – If True, returned DataFrame will include two additional columns specifying confidence interval for the true underlying importance value of each feature. Increasing
subsample_size
andnum_iterations
will tighten the confidence interval.confidence_level (float, default = 0.99) – This argument is only considered when
include_confidence_band=True
, and can be used to specify the confidence level used for constructing confidence intervals. For example, ifconfidence_level
is set to 0.99, then the returned DataFrame will include columnsp99_high
andp99_low
which indicates that the true feature importance will be betweenp99_high
andp99_low
99% of the time (99% confidence interval). More generally, ifconfidence_level
= 0.XX, then the columns containing the XX% confidence interval will be namedpXX_high
andpXX_low
.
- Returns:
index: The feature name. ‘importance’: The estimated feature importance score. ‘stddev’: The standard deviation of the feature importance score. If NaN, then not enough
num_iterations
were used.- Return type:
pd.DataFrame
of feature importance scores with 2 columns