TabularPredictor.transform_features#

TabularPredictor.transform_features(data=None, model=None, base_models=None, return_original_features=True)[source]#

Transforms data features through the AutoGluon feature generator. This is useful to gain an understanding of how AutoGluon interprets the data features. The output of this function can be used to train further models, even outside of AutoGluon. This can be useful for training your own models on the same data representation as AutoGluon. Individual AutoGluon models like the neural network may apply additional feature transformations that are not reflected in this method. This method only applies universal transforms employed by all AutoGluon models. When data=None, `base_models=[{best_model}], and bagging was enabled during fit():

This returns the out-of-fold predictions of the best model, which can be used as training input to a custom user stacker model.

Parameters
  • data (str or TabularDataset or pd.DataFrame (optional)) –

    The data to apply feature transformation to. This data does not require the label column. If str is passed, data will be loaded using the str value as the file path. If not specified, the original data used during fit() will be used if fit() was previously called with cache_data=True. Otherwise, an exception will be raised.

    For non-bagged mode predictors:

    The data used when not specified is the validation set. This can either be an automatically generated validation set or the user-defined tuning_data if passed during fit(). If all parameters are unspecified, then the output is equivalent to predictor.load_data_internal(data=’val’, return_X=True, return_y=False)[0]. To get the label values of the output, call predictor.load_data_internal(data=’val’, return_X=False, return_y=True)[1]. If the original training set is desired, it can be passed in through data.

    Warning: Do not pass the original training set if model or base_models are set. This will result in overfit feature transformation.

    For bagged mode predictors:

    The data used when not specified is the full training set. If all parameters are unspecified, then the output is equivalent to predictor.load_data_internal(data=’train’, return_X=True, return_y=False)[0]. To get the label values of the output, call predictor.load_data_internal(data=’train’, return_X=False, return_y=True)[1]. base_model features generated in this instance will be from out-of-fold predictions. Note that the training set may differ from the training set originally passed during fit(), as AutoGluon may choose to drop or duplicate rows during training. Warning: Do not pass the original training set through data if model or base_models are set. This will result in overfit feature transformation. Instead set data=None.

  • model (str, default = None) –

    Model to generate input features for. The output data will be equivalent to the input data that would be sent into model.predict_proba(data).

    Note: This only applies to cases where data is not the training data.

    If None, then only return generically preprocessed features prior to any model fitting. Valid models are listed in this predictor by calling predictor.get_model_names(). Specifying a refit_full model will cause an exception if data=None. base_models=None is a requirement when specifying model.

  • base_models (list, default = None) – List of model names to use as base_models for a hypothetical stacker model when generating input features. If None, then only return generically preprocessed features prior to any model fitting. Valid models are listed in this predictor by calling predictor.get_model_names(). If a stacker model S exists with base_models=M, then setting base_models=M is equivalent to setting model=S. model=None is a requirement when specifying base_models.

  • return_original_features (bool, default = True) –

    Whether to return the original features. If False, only returns the additional output columns from specifying model or base_models.

    This is useful to set to False if the intent is to use the output as input to further stacker models without the original features.

Returns

  • pd.DataFrame of the provided data after feature transformation has been applied.

  • This output does not include the label column, and will remove it if present in the supplied data.

  • If a transformed label column is desired, use predictor.transform_labels.

Examples

>>> from autogluon.tabular import TabularPredictor
>>> predictor = TabularPredictor(label='class').fit('train.csv', label='class', auto_stack=True)  # predictor is in bagged mode.
>>> model = 'WeightedEnsemble_L2'
>>> train_data_transformed = predictor.transform_features(model=model)  # Internal training DataFrame used as input to `model.fit()` for each model trained in predictor.fit()`
>>> test_data_transformed = predictor.transform_features('test.csv', model=model)  # Internal test DataFrame used as input to `model.predict_proba()` during `predictor.predict_proba(test_data, model=model)`