autogluon.tabular.models¶

Note

This documentation is for advanced users, and is not comprehensive.

For a stable public API, refer to TabularPredictor.

Model Keys¶

To fit a model with TabularPredictor, you must specify it in the TabularPredictor.fit hyperparameters argument.

hyperparameters takes in a dictionary of models, where each key is a model name, and the values are a list of dictionaries of model hyperparameters.

For example:

hyperparameters = {
    'NN_TORCH': {},
    'GBM': [
        {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
        {},
        {
            "learning_rate": 0.03,
            "num_leaves": 128,
            "feature_fraction": 0.9,
            "min_data_in_leaf": 3,
            "ag_args": {"name_suffix": "Large", "priority": 0, "hyperparameter_tune_kwargs": None},
        },
    ],
    'CAT': {},
    'XGB': {},
    'FASTAI': {},
    'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}

Here is the mapping of keys to models:

MODEL_TYPES = {
    "RF": RFModel,
    "XT": XTModel,
    "KNN": KNNModel,
    "GBM": LGBModel,
    "CAT": CatBoostModel,
    "XGB": XGBoostModel,
    "REALMLP": RealMLPModel,
    "MITRA": MitraModel,
    "TABICL": TabICLModel,
    "TABPFNV2": TabPFNV2Model,
    "NN_TORCH": TabularNeuralNetTorchModel,
    "LR": LinearModel,
    "FASTAI": NNFastAiTabularModel,
    "TABM": TabMModel,
    "AG_TEXT_NN": TextPredictorModel,
    "AG_IMAGE_NN": ImagePredictorModel,
    "AG_AUTOMM": MultiModalPredictorModel,
    "FT_TRANSFORMER": FTTransformerModel,
    "FASTTEXT": FastTextModel,
    "ENS_WEIGHTED": GreedyWeightedEnsembleModel,
    "SIMPLE_ENS_WEIGHTED": SimpleWeightedEnsembleModel,

    # interpretable models
    "IM_RULEFIT": RuleFitModel,
    "IM_GREEDYTREE": GreedyTreeModel,
    "IM_FIGS": FigsModel,
    "IM_HSTREE": HSTreeModel,
    "IM_BOOSTEDRULES": BoostedRulesModel,

    "DUMMY": DummyModel,
}

Here is the mapping of model types to their default names when trained:

DEFAULT_MODEL_NAMES = {
    RFModel: 'RandomForest',
    XTModel: 'ExtraTrees',
    KNNModel: 'KNeighbors',
    LGBModel: 'LightGBM',
    CatBoostModel: 'CatBoost',
    XGBoostModel: 'XGBoost',
    RealMLPModel: 'RealMLP',
    TabMModel: 'TabM',
    MitraModel: 'Mitra',
    TabICLModel: 'TabICL',
    TabPFNV2Model: 'TabPFNv2',
    TabularNeuralNetTorchModel: 'NeuralNetTorch',
    LinearModel: 'LinearModel',
    NNFastAiTabularModel: 'NeuralNetFastAI',
    TextPredictorModel: 'TextPredictor',
    ImagePredictorModel: 'ImagePredictor',
    MultiModalPredictorModel: 'MultiModalPredictor',

    FTTransformerModel: 'FTTransformer',
    FastTextModel: 'FastText',
    GreedyWeightedEnsembleModel: 'WeightedEnsemble',
    SimpleWeightedEnsembleModel: 'WeightedEnsemble',

    # Interpretable models
    RuleFitModel: 'RuleFit',
    GreedyTreeModel: 'GreedyTree',
    FigsModel: 'Figs',
    HSTreeModel: 'HierarchicalShrinkageTree',
    BoostedRulesModel: 'BoostedRules',
}

Model Name Suffixes¶

Models trained by TabularPredictor can have suffixes in their names that have special meanings.

The suffixes are as follows:

“_Lx”: Indicates the stack level (x) the model is trained in, such as “_L1”, “_L2”, etc. A model with “_L1” suffix is a base model, meaning it does not depend on any other models. If a model lacks this suffix, then it is a base model and is at level 1 (“_L1”).

“/Tx”: Indicates that the model was trained via hyperparameter search (HPO). Tx is shorthand for HPO trial #x. An example would be “LightGBM/T8”.

“_BAG”: Indicates that the model is a bagged ensemble. A bagged ensemble contains multiple instances of the model (children) trained with different subsets of the data. During inference, these child models each predict on the data and their predictions are averaged in the final result. This typically achieves a stronger result than any of the individual models alone, but slows down inference speed significantly. Refer to “_FULL” for instructions on how to improve inference speed.

“_FULL”: Indicates the model has been refit via TabularPredictor’s refit_full method. This model will have no validation score because all of the data (train and validation) was used as training data. Usually, there will be another model with the same name as this model minus the “_FULL” suffix. Often, this model can outperform the original model because of using more data during training, but is usually weaker if the original was a bagged ensemble (“_BAG”), but with much faster inference speed.

“_DSTL”: Indicates the model was created through model distillation via a call to TabularPredictor’s distill method. Validation scores of distilled models should only be compared against other distilled models.

“_x”: Indicates that the name without this added suffix already existed in a different model, so this suffix was added to avoid overwriting the pre-existing model. An example would be “LightGBM_2”.

Models¶

`AbstractModel`	Abstract model implementation from which all AutoGluon models inherit.
`LGBModel`	LightGBM model: https://lightgbm.readthedocs.io/en/latest/
`CatBoostModel`	CatBoost model: https://catboost.ai/
`XGBoostModel`	XGBoost model: https://xgboost.readthedocs.io/en/latest/
`RealMLPModel`	RealMLP is an improved multilayer perception (MLP) model through a bag of tricks and better default hyperparameters.
`TabMModel`	TabM is an efficient ensemble of MLPs that is trained simultaneously with mostly shared parameters.
`MitraModel`	Mitra is a tabular foundation model pre-trained purely on synthetic data with the goal of optimizing fine-tuning performance over in-context learning performance.
`TabICLModel`	TabICL is a foundation model for tabular data using in-context learning that is scalable to larger datasets than TabPFNv2.
`TabPFNV2Model`	TabPFNv2 is a tabular foundation model pre-trained purely on synthetic data that achieves state-of-the-art results with in-context learning on small datasets with <=10000 samples and <=500 features.
`RFModel`	Random Forest model (scikit-learn): https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
`XTModel`	Extra Trees model (scikit-learn): https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier
`KNNModel`	KNearestNeighbors model (scikit-learn): https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
`LinearModel`	Linear model (scikit-learn): https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
`TabularNeuralNetTorchModel`	PyTorch neural network models for classification/regression with tabular data.
`NNFastAiTabularModel`	Class for fastai v1 neural network models that operate on tabular data.
`MultiModalPredictorModel`
`TextPredictorModel`	MultimodalPredictor that doesn't use image features
`ImagePredictorModel`	MultimodalPredictor that only uses image features. Currently only supports 1 image column, with 1 image per sample. Additionally has special null image handling to improve performance in the presence of null images (aka image path of '') Note: null handling has not been compared to the built-in null handling of MultimodalPredictor yet.

AbstractModel¶

Abstract model implementation from which all AutoGluon models inherit.

Parameters:

path (str, default = None) – Directory location to store all outputs. If None, a new unique time-stamped directory is chosen.
name (str, default = None) – Name of the subdirectory inside path where model will be saved. The final model directory will be os.path.join(path, name) If None, defaults to the model’s class name: self.__class__.__name__
problem_type (str, default = None) – Type of prediction problem, i.e. is this a binary/multiclass classification or regression problem (options: ‘binary’, ‘multiclass’, ‘regression’). If None, will attempt to infer the problem type based on training data labels during training.
eval_metric (autogluon.core.metrics.Scorer or str, default = None) –
Metric by which predictions will be ultimately evaluated on test data. This only impacts model.score(), as eval_metric is not used during training.

If eval_metric = None, it is automatically chosen based on problem_type. Defaults to ‘accuracy’ for binary and multiclass classification and ‘root_mean_squared_error’ for regression. Otherwise, options for classification:

[‘accuracy’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_weighted’, ‘roc_auc’, ‘roc_auc_ovo’, ‘roc_auc_ovr’, ‘average_precision’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_weighted’, ‘log_loss’, ‘pac_score’, ‘quadratic_kappa’]

Options for regression:
[‘root_mean_squared_error’, ‘mean_squared_error’, ‘mean_absolute_error’, ‘median_absolute_error’, ‘r2’]

Options for quantile regression:
[‘pinball_loss’]

For more information on these options, see sklearn.metrics: https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics

You can also pass your own evaluation function here as long as it follows formatting of the functions defined in folder autogluon.core.metrics.
hyperparameters – Hyperparameters that will be used by the model (can be search spaces instead of fixed values). If None, model defaults are used. This is identical to passing an empty dictionary.

can_compile(compiler_configs: dict = None) → bool[source]¶

Verify whether the model can be compiled with the compiler configuration.

Parameters:: compiler_configs (dict, default=None) – Model specific compiler options. This can be useful to specify the compiler backend for a specific model, e.g. {“RandomForest”: {“compiler”: “onnx”}}

can_estimate_memory_usage_static() → bool[source]¶: True if estimate_memory_usage_static is implemented for this model. If False, calling estimate_memory_usage_static will raise a NotImplementedError.

can_estimate_memory_usage_static_child() → bool[source]¶: True if estimate_memory_usage_static is implemented for this model’s child. If False, calling estimate_memory_usage_static_child will raise a NotImplementedError.

can_fit() → bool[source]¶: Returns True if the model can be fit.

can_infer() → bool[source]¶: Returns True if the model is capable of inference on new data.

can_predict_proba() → bool[source]¶: Returns True if the model can predict probabilities.

compile(compiler_configs: dict = None)[source]¶

Compile the trained model for faster inference.

NOTE: - The model is assumed to be fitted before compilation. - If save_in_pkl attribute of the compiler is False, self.model would be set to None.

Parameters:: compiler_configs (dict, default=None) – Model specific compiler options. This can be useful to specify the compiler backend for a specific model, e.g. {“RandomForest”: {“compiler”: “onnx”}}

compute_feature_importance(X: DataFrame, y: Series, features: list[str] = None, silent: bool = False, importance_as_list: bool = False, **kwargs) → DataFrame[source]¶

Compute feature importance via permutation shuffling.

Parameters:

X
y
features
silent
importance_as_list
kwargs

Return type:

pd.DataFrame of feature importance

convert_to_refit_full_template()[source]¶

After calling this function, returned model should be able to be fit without X_val, y_val using the iterations trained by the original model.

Increase max_memory_usage_ratio by 25% to reduce the chance that the refit model will trigger NotEnoughMemoryError and skip training. This can happen without the 25% increase since the refit model generally will use more training data and thus require more memory.

convert_to_refit_full_via_copy()[source]¶: Creates a new refit_full variant of the model, but instead of training it simply copies self. This method is for compatibility with models that have not implemented refit_full support as a fallback.

convert_to_template()[source]¶: After calling this function, returned model should be able to be fit as if it was new, as well as deep-copied. The model name and path will be identical to the original, and must be renamed prior to training to avoid overwriting the original model files if they exist.

delete_from_disk(silent: bool = False)[source]¶

Deletes the model from disk.

WARNING: This will DELETE ALL FILES in the self.path directory, regardless if they were created by AutoGluon or not. DO NOT STORE FILES INSIDE OF THE MODEL DIRECTORY THAT ARE UNRELATED TO AUTOGLUON.

estimate_memory_usage(X: DataFrame, **kwargs) → int[source]¶

Estimates the peak memory usage of the model while training.

Parameters:: X (pd.DataFrame) – The training data features
Returns:: int
Return type:: estimated peak memory usage in bytes during training

estimate_memory_usage_child(X: DataFrame, **kwargs) → int[source]¶

Estimates the peak memory usage of the child model while training.

If the model is not a bagged model (aka has no children), then will return its personal memory usage estimate.

Parameters:

X (pd.DataFrame) – The training data features
**kwargs

Returns:

int

Return type:

estimated peak memory usage in bytes during training of the child

classmethod estimate_memory_usage_static(*, X: DataFrame, y: Series = None, hyperparameters: dict = None, problem_type: str = 'infer', num_classes: int | None | str = 'infer', **kwargs) → int[source]¶

Estimates the peak memory usage of the model while training, without having to initialize the model.

Parameters:

X (pd.DataFrame) – The training data features
y (pd.Series, optional) – The training data ground truth. Must be specified if problem_type or num_classes is unspecified.
hyperparameters (dict, optional) – The model hyperparameters
problem_type (str, default = "infer") – The problem_type. If “infer” will infer based on y.
num_classes – The num_classes. If “infer” will infer based on y.
**kwargs – Other optional key-word fit arguments that could impact memory usage for the model.

Returns:

int

Return type:

estimated peak memory usage in bytes during training

estimate_memory_usage_static_child(*, X: DataFrame, y: Series = None, hyperparameters: dict = None, problem_type: str = 'infer', num_classes: int | None | str = 'infer', **kwargs) → int[source]¶

Estimates the peak memory usage of the child model while training, without having to initialize the model.

Note that this method itself is not static, because the child model must be present as a variable in the model to call its static memory estimate method.

To obtain the child memory estimate in a fully static manner, instead directly call the child’s estimate_memory_usage_static method.

Parameters:

X (pd.DataFrame) – The training data features
y (pd.Series, optional) – The training data ground truth. Must be specified if problem_type or num_classes is unspecified.
hyperparameters (dict, optional) – The model hyperparameters
problem_type (str, default = "infer") – The problem_type. If “infer” will infer based on y.
num_classes – The num_classes. If “infer” will infer based on y.
**kwargs – Other optional key-word fit arguments that could impact memory usage for the model.

Returns:

int

Return type:

estimated peak memory usage in bytes during training of the child

fit(*, log_resources: bool = False, **kwargs)[source]¶

Fit model to predict values in y based on X.

Models should not override the fit method, but instead override the _fit method which has the same arguments.

Parameters:

X (DataFrame) – The training data features.
y (Series) – The training data ground truth labels.
X_val (DataFrame, default = None) – The validation data features. If None, early stopping via validation score will be disabled.
y_val (Series, default = None) – The validation data ground truth labels. If None, early stopping via validation score will be disabled.
X_test (DataFrame, default = None) – The test data features. Note: Not used for training, but for tracking test performance. If None, early stopping via validation score will be disabled.
y_test (Series, default = None) – The test data ground truth labels. Note: Not used for training, but for tracking test performance. If None, early stopping via validation score will be disabled.
X_unlabeled (DataFrame, default = None) – Unlabeled data features. Models may optionally implement logic which leverages unlabeled data to improve model accuracy.
time_limit (float, default = None) – Time limit in seconds to adhere to when fitting model. Ideally, model should early stop during fit to avoid going over the time limit if specified.
sample_weight (Series, default = None) – The training data sample weights. Models may optionally leverage sample weights during fit. If None, model decides. Typically, models assume uniform sample weight.
sample_weight_val (Series, default = None) – The validation data sample weights. If None, model decides. Typically, models assume uniform sample weight.
num_cpus (int, default = 'auto') – How many CPUs to use during fit. This is counted in virtual cores, not in physical cores. If ‘auto’, model decides.
num_gpus (int, default = 'auto') – How many GPUs to use during fit. If ‘auto’, model decides.
feature_metadata (autogluon.common.features.feature_metadata.FeatureMetadata, default = None) – Contains feature type information that can be used to identify special features such as text ngrams and datetime as well as which features are numerical vs categorical. If None, feature_metadata is inferred during fit.
verbosity (int, default = 2) – Verbosity levels range from 0 to 4 and control how much information is printed. Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). verbosity 4: logs every training iteration, and logs the most detailed information. verbosity 3: logs training iterations periodically, and logs more detailed information. verbosity 2: logs only important information. verbosity 1: logs only warnings and exceptions. verbosity 0: logs only exceptions.
log_resources (bool, default = False) – If True, will log information about the number of CPUs, GPUs, and memory usage during fit.
**kwargs – Any additional fit arguments a model supports.

property fit_num_cpus: int¶: Number of CPUs used when this model was fit

property fit_num_cpus_child: int¶: Number of CPUs used for fitting one model (i.e. a child model)

property fit_num_gpus: float¶: Number of GPUs used when this model was fit

property fit_num_gpus_child: float¶: Number of GPUs used for fitting one model (i.e. a child model)

classmethod get_ag_priority(problem_type: str | None = None) → int[source]¶: Returns the AutoGluon fit priority, defined by cls.ag_priority and cls.ag_priority_by_problem_type.

get_fit_metadata() → dict[source]¶: Returns dictionary of metadata related to model fit that isn’t related to hyperparameters. Must be called after model has been fit.

get_hyperparameters_init() → dict[source]¶

Returns:: hyperparameters – The dictionary of user specified hyperparameters for the model.
Return type:: dict

get_info(include_feature_metadata: bool = True) → dict[source]¶: Returns a dictionary of numerous fields describing the model.

get_memory_size(allow_exception: bool = False) → int | None[source]¶

Pickled the model object (self) and returns the size in bytes. Will raise an exception if self cannot be pickled.

Note: This will temporarily double the memory usage of the model, as both the original and the pickled version will exist in memory. This can lead to an out-of-memory error if the model is larger than the remaining available memory.

Parameters:: allow_exception (bool, default = False) – If True and an exception occurs during the memory size calculation, will return None instead of raising the exception. For example, if a model failed during fit and had a messy internal state, and then get_memory_size was called, it may still contain a non-serializable object. By setting allow_exception=True, we avoid crashing in this scenario. For example: “AttributeError: Can’t pickle local object ‘func_generator.<locals>.custom_metric’”
Returns:: memory_size – The memory size in bytes of the pickled model object. None if an exception occurred and allow_exception=True.
Return type:: int | None

get_minimum_resources(is_gpu_available: bool = False) → dict[str, int | float][source]¶

Parameters:

is_gpu_available (bool, default = False) – Whether gpu is available in the system. Model that can be trained both on cpu and gpu can decide the minimum resources based on this.
model. (Returns a dictionary of minimum resource requirements to fit the)
train. (Subclass should consider overriding this method if it requires more resources to)
dictionary (If a resource is not part of the output)
unnecessary. (it is considered)
keys (Valid)

get_params() → dict[source]¶: Get params of the model at the time of initialization

get_params_aux_info() → dict[source]¶: Converts learning curve scorer objects into their name strings.

Returns:¶

params_aux dictionary with changed curve_metrics field, if applicable.

get_trained_params() → dict[source]¶: Returns the hyperparameters of the trained model. If the model early stopped, this will contain the epoch/iteration the model uses during inference, instead of the epoch/iteration specified during fit. This is used for generating a model template to refit on all of the data (no validation set).

hyperparameter_tune(hyperparameter_tune_kwargs='auto', hpo_executor: HpoExecutor = None, time_limit: float = None, **kwargs)[source]¶

Perform hyperparameter tuning of the model, fitting multiple variants of the model based on the search space provided in hyperparameters during init.

Parameters:

hyperparameter_tune_kwargs (str or dict, default='auto') –
Hyperparameter tuning strategy and kwargs (for example, how many HPO trials to run). Valid keys:

’num_trials’: Number of hpo trials you want to perform. ‘scheduler’: Scheduler used by hpo experiment.

Valid values:
’local’: Local FIFO scheduler. Sequential if Custom backend and parallel if Ray Tune backend.

’searcher’: Search algorithm used by hpo experiment.

Valid values:
’auto’: Random search. ‘random’: Random search. ‘bayes’: Bayes Optimization. Only supported by Ray Tune backend.

Valid preset values:
’auto’: Uses the ‘random’ preset. ‘random’: Performs HPO via random search using local scheduler.

The ‘searcher’ key is required when providing a dict.
hpo_executor (HpoExecutor, default None) – Executor to perform HPO experiment. This implements the interface for different HPO backends. For more info, please refer to HpoExecutor under core/hpo/executors.py
time_limit (float, default None) – In general, this is the time limit in seconds to run HPO for. In reality, this is the time limit in seconds budget to fully train all trials executed by HPO. For example, BaggedEnsemble will only use a fraction of the time limit during HPO because it needs the remaining time later to fit all of the folds of the trials.
**kwargs –

Same kwargs you would pass to fit call, such as:
X y X_val y_val feature_metadata sample_weight sample_weight_val

Returns:

Tuple of (hpo_results (dict[str, dict], hpo_info: Any))
hpo_results (dict[str, dict]) –

A dictionary of trial model names to a dictionary containing:

path: str
Absolute path to the trained model artifact. Used to load the model.

val_score: float
val_score of the model

trial: int
Trial number of the model, starting at 0.

hyperparameters: dict
Hyperparameter config of the model trial.
hpo_info (Any) – Advanced output with scheduler specific logic, primarily for debugging. In case of Ray Tune backend, this will be an Analysis object: https://docs.ray.io/en/latest/tune/api/doc/ray.tune.ExperimentAnalysis.html

is_fit() → bool[source]¶: Returns True if the model has been fit.

is_initialized() → bool[source]¶: Returns True if the model is initialized. This indicates whether the model has inferred various information such as problem_type and num_classes. A model is automatically initialized when .fit or .hyperparameter_tune are called.

is_valid() → bool[source]¶: Returns True if the model is capable of inference on new data (if normal model) or has produced out-of-fold predictions (if bagged model) This indicates whether the model can be used as a base model to fit a stack ensemble model.

classmethod load(path: str, reset_paths: bool = True, verbose: bool = True)[source]¶

Loads the model from disk to memory.

Parameters:

path (str) – Path to the saved model, minus the file name. This should generally be a directory path ending with a ‘/’ character (or appropriate path separator value depending on OS). The model file is typically located in os.path.join(path, cls.model_file_name).
reset_paths (bool, default True) – Whether to reset the self.path value of the loaded model to be equal to path. It is highly recommended to keep this value as True unless accessing the original self.path value is important. If False, the actual valid path and self.path may differ, leading to strange behaviour and potential exceptions if the model needs to load any other files at a later time.
verbose (bool, default True) – Whether to log the location of the loaded file.

Returns:

model – Loaded model object.

Return type:

cls

classmethod load_learning_curves(path: str) → list[source]¶

Loads the learning_curve data from disk to memory.

Parameters:: path (str) – Path to the saved model, minus the file name. This should generally be a directory path ending with a ‘/’ character (or appropriate path separator value depending on OS). The model file is typically located in os.path.join(path, cls.model_file_name).
Returns:: learning_curves – Loaded learning curve data.
Return type:: list

predict(X, **kwargs) → ndarray[source]¶: Returns class predictions of X. For binary and multiclass problems, this returns the predicted class labels as a 1d numpy array. For regression problems, this returns the predicted values as a 1d numpy array.

predict_from_proba(y_pred_proba: ndarray) → ndarray[source]¶

Convert prediction probabilities to predictions.

Parameters:: y_pred_proba (np.ndarray) – The prediction probabilities to be converted to predictions.
Returns:: y_pred – The predictions obtained from y_pred_proba.
Return type:: np.ndarray

Examples

>>> y_pred = predictor.predict(X)
>>> y_pred_proba = predictor.predict_proba(X)
>>>
>>> # Identical to y_pred
>>> y_pred_from_proba = predictor.predict_from_proba(y_pred_proba)

property predict_n_size: int | None¶: The number of rows in the data used when calculating self.predict_time.

property predict_n_time_per_row: float | None¶: The time in seconds required to predict 1 row of data given a batch size of self.predict_n_size. Returns None if either self.predict_time or self.predict_n_size are None.

predict_proba(X, *, normalize: bool | None = None, record_time: bool = False, **kwargs) → ndarray[source]¶

Returns class prediction probabilities of X. For binary problems, this returns the positive class label probability as a 1d numpy array. For multiclass problems, this returns the class label probabilities of each class as a 2d numpy array. For regression problems, this returns the predicted values as a 1d numpy array.

Parameters:

X – The data used for prediction.
normalize (bool | None, default = None) – Whether to normalize the predictions prior to returning. If None, will default to self.normalize_pred_probas.
record_time (bool, default = False) – If True, will record the time taken for prediction in self.predict_time and the number of rows of X in self.predict_n_size.
kwargs – Keyword arguments to pass into self._predict_proba.

Returns:

y_pred_proba – The prediction probabilities

Return type:

np.ndarray

preprocess(X, preprocess_nonadaptive: bool = True, preprocess_stateful: bool = True, **kwargs)[source]¶: Preprocesses the input data into internal form ready for fitting or inference. It is not recommended to override this method, as it is closely tied to multi-layer stacking logic. Instead, override _preprocess.

record_predict_info(X: DataFrame)[source]¶

Records the necessary information to compute self.predict_n_time_per_row.

Parameters:: X (pd.DataFrame) – The data used to predict on when calculating self.predict_time.

reduce_memory_size(remove_fit: bool = True, remove_info: bool = False, requires_save: bool = True, **kwargs)[source]¶

Removes non-essential objects from the model to reduce memory and disk footprint. If remove_fit=True, enables the removal of variables which are required for fitting the model. If the model is already fully trained, then it is safe to remove these. If remove_info=True, enables the removal of variables which are used during model.get_info(). The values will be None when calling model.get_info(). If requires_save=True, enables the removal of variables which are part of the model.pkl object, requiring an overwrite of the model to disk if it was previously persisted.

It is not necessary for models to implement this.

rename(name: str)[source]¶: Renames the model and updates self.path to reflect the updated name.

save(path: str | None = None, verbose: bool = True) → str[source]¶

Saves the model to disk.

Parameters:

path (str, default None) – Path to the saved model, minus the file name. This should generally be a directory path ending with a ‘/’ character (or appropriate path separator value depending on OS). If None, self.path is used. The final model file is typically saved to os.path.join(path, self.model_file_name).
verbose (bool, default True) – Whether to log the location of the saved file.

Returns:

path – Path to the saved model, minus the file name. Use this value to load the model from disk via cls.load(path), cls being the class of the model object, such as model = RFModel.load(path)

Return type:

str

save_learning_curves(metrics: str | list[str], curves: dict[dict[str, list[float]]], path: str = None) → str[source]¶

Saves learning curves to disk.

Outputted Curve Format:

out = [

metrics, [

[ # log_loss
[0.693147, 0.690162, …], # train [0.693147, 0.690162, …], # val [0.693147, 0.690162, …], # test

], [ # accuracy

[0.693147, 0.690162, …], # train [0.693147, 0.690162, …], # val [0.693147, 0.690162, …], # test

], [ # f1

[0.693147, 0.690162, …], # train [0.693147, 0.690162, …], # val [0.693147, 0.690162, …], # test

],

]

Parameters:

metrics (str or list(str)) – List of all evaluation metrics computed at each iteration of the curve
curves (dict[dict[str : list[float]]]) –
Dictionary of evaluation sets and their learning curve dictionaries. Each learning curve dictionary contains evaluation metrics computed at each iteration. e.g.

curves = {

“train”: {
‘logloss’: [0.693147, 0.690162, …], ‘accuracy’: [0.500000, 0.400000, …], ‘f1’: [0.693147, 0.690162, …]

}, “val”: {…}, “test”: {…},

}
path (str, default None) – Path where the learning curves are saved, minus the file name. This should generally be a directory path ending with a ‘/’ character (or appropriate path separator value depending on OS). If None, self.path is used. The final curve file is typically saved to os.path.join(path, curves.json).

Returns:

path – Path to the saved curves, minus the file name.

Return type:

str

classmethod supported_problem_types() → list[str] | None[source]¶: Returns the list of supported problem types. If None is returned, then the model has not specified the supported problem types, and it is unknown which problem types are valid.

In this case, all problem types are considered supported and the model will never be filtered out based on problem type.

validate_fit_args(X: DataFrame, **kwargs)[source]¶

Verifies if the fit arguments satisfy the model’s constraints. Raises an exception if constraints are not satisfied.

Checks for:: ag.problem_types ag.max_rows ag.max_features ag.max_classes ag.ignore_constraints

validate_fit_resources(num_cpus='auto', num_gpus='auto', total_resources=None, **kwargs)[source]¶: Verifies that the provided num_cpus and num_gpus (or defaults if not provided) are sufficient to train the model. Raises an AssertionError if not sufficient.

LGBModel¶

class autogluon.tabular.models.LGBModel(**kwargs)[source]¶

LightGBM model: https://lightgbm.readthedocs.io/en/latest/

Hyperparameter options: https://lightgbm.readthedocs.io/en/latest/Parameters.html

Extra hyperparameter options:: ag.early_stop : int, specifies the early stopping rounds. Defaults to an adaptive strategy. Recommended to keep default.

CatBoostModel¶

class autogluon.tabular.models.CatBoostModel(**kwargs)[source]¶

CatBoost model: https://catboost.ai/

Hyperparameter options: https://catboost.ai/en/docs/references/training-parameters

XGBoostModel¶

class autogluon.tabular.models.XGBoostModel(**kwargs)[source]¶

XGBoost model: https://xgboost.readthedocs.io/en/latest/

Hyperparameter options: https://xgboost.readthedocs.io/en/latest/parameter.html

RealMLPModel¶

class autogluon.tabular.models.RealMLPModel(**kwargs)[source]¶

RealMLP is an improved multilayer perception (MLP) model through a bag of tricks and better default hyperparameters.

RealMLP is the top performing method overall on TabArena-v0.1: https://tabarena.ai

Paper: Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data Authors: David Holzmüller, Léo Grinsztajn, Ingo Steinwart Codebase: https://github.com/dholzmueller/pytabkit License: Apache-2.0

Added in version 1.4.0.

TabMModel¶

class autogluon.tabular.models.TabMModel(**kwargs)[source]¶

TabM is an efficient ensemble of MLPs that is trained simultaneously with mostly shared parameters.

TabM is one of the top performing methods overall on TabArena-v0.1: https://tabarena.ai

Paper: TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling Authors: Yury Gorishniy, Akim Kotelnikov, Artem Babenko Codebase: https://github.com/yandex-research/tabm License: Apache-2.0

Partially adapted from pytabkit’s TabM implementation.

Added in version 1.4.0.

MitraModel¶

class autogluon.tabular.models.MitraModel(**kwargs)[source]¶

Mitra is a tabular foundation model pre-trained purely on synthetic data with the goal of optimizing fine-tuning performance over in-context learning performance. Mitra was developed by the AutoGluon team @ AWS AI.

Mitra’s default hyperparameters outperforms all methods for small datasets on TabArena-v0.1 (excluding ensembling): https://tabarena.ai

Authors: Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, Bernie Wang Blog Post: https://www.amazon.science/blog/mitra-mixed-synthetic-priors-for-enhancing-tabular-foundation-models License: Apache-2.0

Added in version 1.4.0.

TabICLModel¶

TabICL is a foundation model for tabular data using in-context learning that is scalable to larger datasets than TabPFNv2. It is pretrained purely on synthetic data. TabICL currently only supports classification tasks.

TabICL is one of the top performing methods overall on TabArena-v0.1: https://tabarena.ai

Paper: TabICL: A Tabular Foundation Model for In-Context Learning on Large Data Authors: Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan Codebase: https://github.com/soda-inria/tabicl License: BSD-3-Clause

Added in version 1.4.0.

TabPFNV2Model¶

class autogluon.tabular.models.TabPFNV2Model(**kwargs)[source]¶

TabPFNv2 is a tabular foundation model pre-trained purely on synthetic data that achieves state-of-the-art results with in-context learning on small datasets with <=10000 samples and <=500 features. TabPFNv2 is developed and maintained by PriorLabs: https://priorlabs.ai/

TabPFNv2 is the top performing method for small datasets on TabArena-v0.1: https://tabarena.ai

Paper: Accurate predictions on small data with a tabular foundation model Authors: Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister & Frank Hutter Codebase: https://github.com/PriorLabs/TabPFN License: https://github.com/PriorLabs/TabPFN/blob/main/LICENSE

Added in version 1.4.0.

LinearModel¶

class autogluon.tabular.models.LinearModel(**kwargs)[source]¶

Linear model (scikit-learn): https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Model backend differs depending on problem_type:

‘binary’ & ‘multiclass’: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

‘regression’: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge

TabularNeuralNetTorchModel¶

class autogluon.tabular.models.TabularNeuralNetTorchModel(**kwargs)[source]¶

PyTorch neural network models for classification/regression with tabular data.

Extra hyperparameter options:

ag.early_stopint | str, default = “default”: Specifies the early stopping rounds. Defaults to an adaptive strategy. Recommended to keep default.

NNFastAiTabularModel¶

class autogluon.tabular.models.NNFastAiTabularModel(**kwargs)[source]¶

Class for fastai v1 neural network models that operate on tabular data.

Hyperparameters:

‘y_scaler’: on regression problems, the model can give unreasonable predictions on unseen data. To address this problem, AutoGluon scales y values by default for regression problems. This attribute allows to pass a custom scaler for y values. Please note that intermediate iteration metrics will be affected by this transform and as a result intermediate iteration scores will be different from the final ones (these will be correct). https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

‘clipping’: on regression problems, extreme outliers of y can hurt performance of the model during training and on unseen data. To address this problem, AutoGluon clips input y values and output predictions by default to a range inferred from the training data. Setting this attribute to False disables clipping.

‘layers’: list of hidden layers sizes; None - use model’s heuristics; default is None

‘emb_drop’: embedding layers dropout; default is 0.1

‘ps’: linear layers dropout - list of values applied to every layer in layers; default is [0.1]

‘bs’: batch size; default is 256

‘lr’: maximum learning rate for one cycle policy; default is 1e-2; see also https://docs.fast.ai/callback.schedule.html#Learner.fit_one_cycle, One-cycle policy paper: https://arxiv.org/abs/1803.09820

‘epochs’: number of epochs; default is 30

# Early stopping settings. See more details here: https://docs.fast.ai/callback.tracker.html#EarlyStoppingCallback ‘early.stopping.min_delta’: 0.0001, ‘early.stopping.patience’: 10,

MultiModalPredictorModel¶

class autogluon.tabular.models.MultiModalPredictorModel(**kwargs)[source]¶

TextPredictorModel¶

class autogluon.tabular.models.TextPredictorModel(**kwargs)[source]¶: MultimodalPredictor that doesn’t use image features

ImagePredictorModel¶

class autogluon.tabular.models.ImagePredictorModel(**kwargs)[source]¶: MultimodalPredictor that only uses image features. Currently only supports 1 image column, with 1 image per sample. Additionally has special null image handling to improve performance in the presence of null images (aka image path of ‘’)

Note: null handling has not been compared to the built-in null handling of MultimodalPredictor yet.

Ensemble Models¶

`BaggedEnsembleModel`	Bagged ensemble meta-model which fits a given model multiple times across different splits of the training data.
`StackerEnsembleModel`	Stack ensemble meta-model which functions identically to `BaggedEnsembleModel` with the additional capability to leverage base models.
`WeightedEnsembleModel`	Weighted ensemble meta-model that implements Ensemble Selection: https://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf

BaggedEnsembleModel¶

class autogluon.core.models.BaggedEnsembleModel(model_base: AbstractModel | Type[AbstractModel], model_base_kwargs: dict[str, any] = None, random_state: int = 0, **kwargs)[source]¶

Bagged ensemble meta-model which fits a given model multiple times across different splits of the training data.

For certain child models such as KNN, this may only train a single model and instead rely on the child model to generate out-of-fold predictions.

Parameters:

model_base (AbstractModel | Type[AbstractModel]) – The base model to repeatedly fit during bagging. If a AbstractModel class, then also provide model_base_kwargs which will be used to initialize the model via model_base(**model_base_kwargs).
model_base_kwargs (dict[str, any], default = None) – kwargs used to initialize model_base if model_base is a class.
random_state (int, default = 0) – Random state used to split the data into cross-validation folds during fit.
**kwargs – Refer to AbstractModel documentation

StackerEnsembleModel¶

class autogluon.core.models.StackerEnsembleModel(base_model_names: List[str] | None = None, base_models_dict: Dict[str, AbstractModel] | None = None, base_model_paths_dict: Dict[str, str] = None, base_model_types_dict: dict | None = None, base_model_types_inner_dict: dict | None = None, base_model_performances_dict: Dict[str, float] | None = None, **kwargs)[source]¶

Stack ensemble meta-model which functions identically to BaggedEnsembleModel with the additional capability to leverage base models.

By specifying base models during init, stacker models can use the base model predictions as features during training and inference.

This property allows for significantly improved model quality in many situations compared to non-stacking alternatives.

Stacker models can act as base models to other stacker models, enabling multi-layer stack ensembling.

Stacker kwargs can be specified in the “ag_args_ensemble” dictionary. For example: ` predictor = TabularPredictor(...).fit(..., hyperparameters={"GBM": [{"ag_args_ensemble": {"max_base_models_per_type": 0}}]}) `

Parameters:

**kwargs –

use_orig_features[True, False, “never”], default True

If True, will use the original data features. If False, will discard the original data features and only use stack features, except when no stack features exist (such as in layer 1). If “never”, will always discard the original data features. Will raise a NoStackFeatures exception if no stack features exist (skipping in layer 1).

valid_stackerbool, default True

If True, will be marked as valid to include as a stacker model. If False, will only be fit as a base model (layer 1) and will not be fit in stack layers (layer 2+).

max_base_modelsint, default 0

Maximum number of base models whose predictions form the features input to this stacker model. If more than max_base_models base models are available, only the top max_base_models models with highest validation score are used. If 0, the logic is skipped.

max_base_models_per_typeint | str, default “auto”

Similar to max_base_models. If more than max_base_models_per_type of any particular model type are available, only the top max_base_models_per_type of that type are used. This occurs before the max_base_models filter. If “auto”, the value will be adaptively set based on the number of training samples.

More samples will lead to larger values, starting at 1 with <1000 samples, increasing up to 12 at >=50000 samples.

If 0, the logic is skipped.

Refer to BaggedEnsembleModel documentation for additional kwargs

WeightedEnsembleModel¶

class autogluon.core.models.WeightedEnsembleModel(**kwargs)[source]¶

Weighted ensemble meta-model that implements Ensemble Selection: https://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf

A autogluon.core.models.GreedyWeightedEnsembleModel must be specified as the model_base to properly function.

Experimental Models¶

`FTTransformerModel`
`FastTextModel`

FTTransformerModel¶

class autogluon.tabular.models.FTTransformerModel(**kwargs)[source]¶

FastTextModel¶

class autogluon.tabular.models.FastTextModel(**kwargs)[source]¶

autogluon.tabular.models¶

Model Keys¶

Model Name Suffixes¶

Models¶

AbstractModel¶

Returns:¶

LGBModel¶

CatBoostModel¶

XGBoostModel¶

RealMLPModel¶

TabMModel¶

MitraModel¶

TabICLModel¶

TabPFNV2Model¶

RFModel¶

XTModel¶

KNNModel¶

LinearModel¶

TabularNeuralNetTorchModel¶

NNFastAiTabularModel¶

MultiModalPredictorModel¶

TextPredictorModel¶

ImagePredictorModel¶

Ensemble Models¶

BaggedEnsembleModel¶

StackerEnsembleModel¶

WeightedEnsembleModel¶

Experimental Models¶

FTTransformerModel¶

FastTextModel¶