TabularPredictor.feature_importance

TabularPredictor.feature_importance(data=None, model: str | None = None, features: list | None = None, feature_stage: str = 'original', subsample_size: int = 5000, time_limit: float | None = None, num_shuffle_sets: int | None = None, include_confidence_band: bool = True, confidence_level: float = 0.99, silent: bool = False)[source]

Calculates feature importance scores for the given model via permutation importance. Refer to https://explained.ai/rf-importance/ for an explanation of permutation importance. A feature’s importance score represents the performance drop that results when the model makes predictions on a perturbed copy of the data where this feature’s values have been randomly shuffled across rows. A feature score of 0.01 would indicate that the predictive performance dropped by 0.01 when the feature was randomly shuffled. The higher the score a feature has, the more important it is to the model’s performance. If a feature has a negative score, this means that the feature is likely harmful to the final model, and a model trained with the feature removed would be expected to achieve a better predictive performance. Note that calculating feature importance can be a very computationally expensive process, particularly if the model uses hundreds or thousands of features. In many cases, this can take longer than the original model training. To estimate how long feature_importance(model, data, features) will take, it is roughly the time taken by predict_proba(data, model) multiplied by the number of features.

Note: For highly accurate importance and p_value estimates, it is recommended to set subsample_size to at least 5000 if possible and num_shuffle_sets to at least 10.

Parameters:
  • data (str or TabularDataset or pd.DataFrame (optional)) –

    This data must also contain the label-column with the same column-name as specified during fit(). If specified, then the data is used to calculate the feature importance scores. If str is passed, data will be loaded using the str value as the file path. If not specified, the original data used during fit() will be used if cache_data=True. Otherwise, an exception will be raised. Do not pass the training data through this argument, as the feature importance scores calculated will be biased due to overfitting.

    More accurate feature importances will be obtained from new data that was held-out during fit().

  • model (str, default = None) – Model to get feature importances for, if None the best model is chosen. Valid models are listed in this predictor by calling predictor.model_names()

  • features (list, default = None) –

    List of str feature names that feature importances are calculated for and returned, specify None to get all feature importances. If you only want to compute feature importances for some of the features, you can pass their names in as a list of str. Valid feature names change depending on the feature_stage.

    To get the list of feature names for feature_stage=’original’, call predictor.feature_metadata_in.get_features(). To get the list of feature names for feature_stage=’transformed’, call list(predictor.transform_features().columns). To get the list of feature names for feature_stage=`transformed_model, call list(predictor.transform_features(model={model_name}).columns).

    [Advanced] Can also contain tuples as elements of (feature_name, feature_list) form.

    feature_name can be any string so long as it is unique with all other feature names / features in the list. feature_list can be any list of valid features in the data. This will compute importance of the combination of features in feature_list, naming the set of features in the returned DataFrame feature_name. This importance will differ from adding the individual importances of each feature in feature_list, and will be more accurate to the overall group importance. Example: [‘featA’, ‘featB’, ‘featC’, (‘featBC’, [‘featB’, ‘featC’])] In this example, the importance of ‘featBC’ will be calculated by jointly permuting ‘featB’ and ‘featC’ together as if they were a single two-dimensional feature.

  • feature_stage (str, default = 'original') –

    What stage of feature-processing should importances be computed for. Options:

    ’original’:

    Compute importances of the original features. Warning: data must be specified with this option, otherwise an exception will be raised.

    ’transformed’:

    Compute importances of the post-internal-transformation features (after automated feature engineering). These features may be missing some original features, or add new features entirely. An example of new features would be ngram features generated from a text column. Warning: For bagged models, feature importance calculation is not yet supported with this option when data=None. Doing so will raise an exception.

    ’transformed_model’:

    Compute importances of the post-model-transformation features. These features are the internal features used by the requested model. They may differ greatly from the original features. If the model is a stack ensemble, this will include stack ensemble features such as the prediction probability features of the stack ensemble’s base (ancestor) models.

  • subsample_size (int, default = 5000) – The number of rows to sample from data when computing feature importance. If subsample_size=None or data contains fewer than subsample_size rows, all rows will be used during computation. Larger values increase the accuracy of the feature importance scores. Runtime linearly scales with subsample_size.

  • time_limit (float, default = None) – Time in seconds to limit the calculation of feature importance. If None, feature importance will calculate without early stopping. A minimum of 1 full shuffle set will always be evaluated. If a shuffle set evaluation takes longer than time_limit, the method will take the length of a shuffle set evaluation to return regardless of the time_limit.

  • num_shuffle_sets (int, default = None) – The number of different permutation shuffles of the data that are evaluated. Larger values will increase the quality of the importance evaluation. It is generally recommended to increase subsample_size before increasing num_shuffle_sets. Defaults to 5 if time_limit is None or 10 if time_limit is specified. Runtime linearly scales with num_shuffle_sets.

  • include_confidence_band (bool, default = True) – If True, returned DataFrame will include two additional columns specifying confidence interval for the true underlying importance value of each feature. Increasing subsample_size and num_shuffle_sets will tighten the confidence interval.

  • confidence_level (float, default = 0.99) – This argument is only considered when include_confidence_band is True, and can be used to specify the confidence level used for constructing confidence intervals. For example, if confidence_level is set to 0.99, then the returned DataFrame will include columns ‘p99_high’ and ‘p99_low’ which indicates that the true feature importance will be between ‘p99_high’ and ‘p99_low’ 99% of the time (99% confidence interval). More generally, if confidence_level = 0.XX, then the columns containing the XX% confidence interval will be named ‘pXX_high’ and ‘pXX_low’.

  • silent (bool, default = False) – Whether to suppress logging output.

Returns:

index: The feature name. ‘importance’: The estimated feature importance score. ‘stddev’: The standard deviation of the feature importance score. If NaN, then not enough num_shuffle_sets were used to calculate a variance. ‘p_value’: P-value for a statistical t-test of the null hypothesis: importance = 0, vs the (one-sided) alternative: importance > 0.

Features with low p-value appear confidently useful to the predictor, while the other features may be useless to the predictor (or even harmful to include in its training data). A p-value of 0.01 indicates that there is a 1% chance that the feature is useless or harmful, and a 99% chance that the feature is useful. A p-value of 0.99 indicates that there is a 99% chance that the feature is useless or harmful, and a 1% chance that the feature is useful.

’n’: The number of shuffles performed to estimate importance score (corresponds to sample-size used to determine confidence interval for true score). ‘pXX_high’: Upper end of XX% confidence interval for true feature importance score (where XX=99 by default). ‘pXX_low’: Lower end of XX% confidence interval for true feature importance score.

Return type:

pd.DataFrame of feature importance scores with 6 columns