Predicting Multiple Columns in a Table (Multi-Label Prediction)¶

In multi-label prediction, we wish to predict multiple columns of a table (i.e. labels) based on the values in the remaining columns. Here we present a simple strategy to do this with AutoGluon, which simply maintains a separate TabularPredictor object for each column being predicted. Correlations between labels can be accounted for in predictions by imposing an order on the labels and allowing the TabularPredictor for each label to condition on the predicted values for labels that appeared earlier in the order.

MultilabelPredictor Class¶

We start by defining a custom MultilabelPredictor class to manage a collection of TabularPredictor objects, one for each label. You can use the MultilabelPredictor similarly to an individual TabularPredictor, except it operates on multiple labels rather than one.

from autogluon.tabular import TabularDataset, TabularPredictor
from autogluon.common.utils.utils import setup_outputdir
from autogluon.core.utils.loaders import load_pkl
from autogluon.core.utils.savers import save_pkl
import os.path

class MultilabelPredictor:
    """ Tabular Predictor for predicting multiple columns in table.
        Creates multiple TabularPredictor objects which you can also use individually.
        You can access the TabularPredictor for a particular label via: `multilabel_predictor.get_predictor(label_i)`

        Parameters
        ----------
        labels : List[str]
            The ith element of this list is the column (i.e. `label`) predicted by the ith TabularPredictor stored in this object.
        path : str, default = None
            Path to directory where models and intermediate outputs should be saved.
            If unspecified, a time-stamped folder called "AutogluonModels/ag-[TIMESTAMP]" will be created in the working directory to store all models.
            Note: To call `fit()` twice and save all results of each fit, you must specify different `path` locations or don't specify `path` at all.
            Otherwise files from first `fit()` will be overwritten by second `fit()`.
            Caution: when predicting many labels, this directory may grow large as it needs to store many TabularPredictors.
        problem_types : List[str], default = None
            The ith element is the `problem_type` for the ith TabularPredictor stored in this object.
        eval_metrics : List[str], default = None
            The ith element is the `eval_metric` for the ith TabularPredictor stored in this object.
        consider_labels_correlation : bool, default = True
            Whether the predictions of multiple labels should account for label correlations or predict each label independently of the others.
            If True, the ordering of `labels` may affect resulting accuracy as each label is predicted conditional on the previous labels appearing earlier in this list (i.e. in an auto-regressive fashion).
            Set to False if during inference you may want to individually use just the ith TabularPredictor without predicting all the other labels.
        kwargs :
            Arguments passed into the initialization of each TabularPredictor.

    """

    multi_predictor_file = 'multilabel_predictor.pkl'

    def __init__(self, labels, path=None, problem_types=None, eval_metrics=None, consider_labels_correlation=True, **kwargs):
        if len(labels) < 2:
            raise ValueError("MultilabelPredictor is only intended for predicting MULTIPLE labels (columns), use TabularPredictor for predicting one label (column).")
        if (problem_types is not None) and (len(problem_types) != len(labels)):
            raise ValueError("If provided, `problem_types` must have same length as `labels`")
        if (eval_metrics is not None) and (len(eval_metrics) != len(labels)):
            raise ValueError("If provided, `eval_metrics` must have same length as `labels`")
        self.path = setup_outputdir(path, warn_if_exist=False)
        self.labels = labels
        self.consider_labels_correlation = consider_labels_correlation
        self.predictors = {}  # key = label, value = TabularPredictor or str path to the TabularPredictor for this label
        if eval_metrics is None:
            self.eval_metrics = {}
        else:
            self.eval_metrics = {labels[i] : eval_metrics[i] for i in range(len(labels))}
        problem_type = None
        eval_metric = None
        for i in range(len(labels)):
            label = labels[i]
            path_i = os.path.join(self.path, "Predictor_" + str(label))
            if problem_types is not None:
                problem_type = problem_types[i]
            if eval_metrics is not None:
                eval_metric = eval_metrics[i]
            self.predictors[label] = TabularPredictor(label=label, problem_type=problem_type, eval_metric=eval_metric, path=path_i, **kwargs)

    def fit(self, train_data, tuning_data=None, **kwargs):
        """ Fits a separate TabularPredictor to predict each of the labels.

            Parameters
            ----------
            train_data, tuning_data : str or pd.DataFrame
                See documentation for `TabularPredictor.fit()`.
            kwargs :
                Arguments passed into the `fit()` call for each TabularPredictor.
        """
        if isinstance(train_data, str):
            train_data = TabularDataset(train_data)
        if tuning_data is not None and isinstance(tuning_data, str):
            tuning_data = TabularDataset(tuning_data)
        train_data_og = train_data.copy()
        if tuning_data is not None:
            tuning_data_og = tuning_data.copy()
        else:
            tuning_data_og = None
        save_metrics = len(self.eval_metrics) == 0
        for i in range(len(self.labels)):
            label = self.labels[i]
            predictor = self.get_predictor(label)
            if not self.consider_labels_correlation:
                labels_to_drop = [l for l in self.labels if l != label]
            else:
                labels_to_drop = [self.labels[j] for j in range(i+1, len(self.labels))]
            train_data = train_data_og.drop(labels_to_drop, axis=1)
            if tuning_data is not None:
                tuning_data = tuning_data_og.drop(labels_to_drop, axis=1)
            print(f"Fitting TabularPredictor for label: {label} ...")
            predictor.fit(train_data=train_data, tuning_data=tuning_data, **kwargs)
            self.predictors[label] = predictor.path
            if save_metrics:
                self.eval_metrics[label] = predictor.eval_metric
        self.save()

    def predict(self, data, **kwargs):
        """ Returns DataFrame with label columns containing predictions for each label.

            Parameters
            ----------
            data : str or autogluon.tabular.TabularDataset or pd.DataFrame
                Data to make predictions for. If label columns are present in this data, they will be ignored. See documentation for `TabularPredictor.predict()`.
            kwargs :
                Arguments passed into the predict() call for each TabularPredictor.
        """
        return self._predict(data, as_proba=False, **kwargs)

    def predict_proba(self, data, **kwargs):
        """ Returns dict where each key is a label and the corresponding value is the `predict_proba()` output for just that label.

            Parameters
            ----------
            data : str or autogluon.tabular.TabularDataset or pd.DataFrame
                Data to make predictions for. See documentation for `TabularPredictor.predict()` and `TabularPredictor.predict_proba()`.
            kwargs :
                Arguments passed into the `predict_proba()` call for each TabularPredictor (also passed into a `predict()` call).
        """
        return self._predict(data, as_proba=True, **kwargs)

    def evaluate(self, data, **kwargs):
        """ Returns dict where each key is a label and the corresponding value is the `evaluate()` output for just that label.

            Parameters
            ----------
            data : str or autogluon.tabular.TabularDataset or pd.DataFrame
                Data to evalate predictions of all labels for, must contain all labels as columns. See documentation for `TabularPredictor.evaluate()`.
            kwargs :
                Arguments passed into the `evaluate()` call for each TabularPredictor (also passed into the `predict()` call).
        """
        data = self._get_data(data)
        eval_dict = {}
        for label in self.labels:
            print(f"Evaluating TabularPredictor for label: {label} ...")
            predictor = self.get_predictor(label)
            eval_dict[label] = predictor.evaluate(data, **kwargs)
            if self.consider_labels_correlation:
                data[label] = predictor.predict(data, **kwargs)
        return eval_dict

    def save(self):
        """ Save MultilabelPredictor to disk. """
        for label in self.labels:
            if not isinstance(self.predictors[label], str):
                self.predictors[label] = self.predictors[label].path
        save_pkl.save(path=os.path.join(self.path, self.multi_predictor_file), object=self)
        print(f"MultilabelPredictor saved to disk. Load with: MultilabelPredictor.load('{self.path}')")

    @classmethod
    def load(cls, path):
        """ Load MultilabelPredictor from disk `path` previously specified when creating this MultilabelPredictor. """
        path = os.path.expanduser(path)
        return load_pkl.load(path=os.path.join(path, cls.multi_predictor_file))

    def get_predictor(self, label):
        """ Returns TabularPredictor which is used to predict this label. """
        predictor = self.predictors[label]
        if isinstance(predictor, str):
            return TabularPredictor.load(path=predictor)
        return predictor

    def _get_data(self, data):
        if isinstance(data, str):
            return TabularDataset(data)
        return data.copy()

    def _predict(self, data, as_proba=False, **kwargs):
        data = self._get_data(data)
        if as_proba:
            predproba_dict = {}
        for label in self.labels:
            print(f"Predicting with TabularPredictor for label: {label} ...")
            predictor = self.get_predictor(label)
            if as_proba:
                predproba_dict[label] = predictor.predict_proba(data, as_multiclass=True, **kwargs)
            data[label] = predictor.predict(data, **kwargs)
        if not as_proba:
            return data[self.labels]
        else:
            return predproba_dict

Training¶

Let’s now apply our multi-label predictor to predict multiple columns in a data table. We first train models to predict each of the labels.

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 500  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country	class
6118	51	Private	39264	Some-college	10	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	>50K
23204	58	Private	51662	10th	6	Married-civ-spouse	Other-service	Wife	White	Female	0	0	8	United-States	<=50K
29590	40	Private	326310	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	44	United-States	<=50K
18116	37	Private	222450	HS-grad	9	Never-married	Sales	Not-in-family	White	Male	0	2339	40	El-Salvador	<=50K
33964	62	Private	109190	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	15024	0	40	United-States	>50K

labels = ['education-num','education','class']  # which columns to predict based on the others
problem_types = ['regression','multiclass','binary']  # type of each prediction problem (optional)
eval_metrics = ['mean_absolute_error','accuracy','accuracy']  # metrics used to evaluate predictions for each label (optional)
save_path = 'agModels-predictEducationClass'  # specifies folder to store trained models (optional)

time_limit = 5  # how many seconds to train the TabularPredictor for each label, set much larger in your applications!

multi_predictor = MultilabelPredictor(labels=labels, problem_types=problem_types, eval_metrics=eval_metrics, path=save_path)
multi_predictor.fit(train_data, time_limit=time_limit)

Fitting TabularPredictor for label: education-num ...
Fitting TabularPredictor for label: education ...
Fitting TabularPredictor for label: class ...
MultilabelPredictor saved to disk. Load with: MultilabelPredictor.load('/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass')

Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.50 GB / 30.95 GB (92.1%)
Disk Space Avail:   204.65 GB / 255.99 GB (79.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'     : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
	presets='high'     : Strong accuracy with fast inference speed.
	presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
	presets='good'     : Good accuracy with very fast inference speed.
	presets='medium'   : Fast training time, ideal for initial prototyping.
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 5s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_education-num"
Train Data Rows:    500
Train Data Columns: 12
Label Column:       education-num
Problem Type:       regression
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29159.53 MB
	Train Data (Original)  Memory Usage: 0.22 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 5 | ['age', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
		('object', []) : 7 | ['workclass', 'marital-status', 'occupation', 'relationship', 'race', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 6 | ['workclass', 'marital-status', 'occupation', 'relationship', 'race', ...]
		('int', [])       : 5 | ['age', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	12 features in original data used to generate 12 features in processed data.
	Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.08s ...
AutoGluon will gauge predictive performance using evaluation metric: 'mean_absolute_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
Fitting 9 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT ... Training model for up to 4.92s of the 4.92s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.5 GB
	-1.7808	 = Validation score   (-mean_absolute_error)
	0.38s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 4.54s of the 4.54s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.5 GB
	-1.7854	 = Validation score   (-mean_absolute_error)
	0.24s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForestMSE ... Training model for up to 4.29s of the 4.29s of remaining time.
	Fitting with cpus=8, gpus=0
	-1.7082	 = Validation score   (-mean_absolute_error)
	0.46s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 3.76s of the 3.76s of remaining time.
	Fitting with cpus=4, gpus=0
	-1.6258	 = Validation score   (-mean_absolute_error)
	0.85s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesMSE ... Training model for up to 2.90s of the 2.90s of remaining time.
	Fitting with cpus=8, gpus=0
	-1.8193	 = Validation score   (-mean_absolute_error)
	0.43s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: NeuralNetFastAI ... Training model for up to 2.41s of the 2.41s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.4 GB
	-1.8728	 = Validation score   (-mean_absolute_error)
	0.95s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: XGBoost ... Training model for up to 1.44s of the 1.44s of remaining time.
	Fitting with cpus=4, gpus=0
	-1.6629	 = Validation score   (-mean_absolute_error)
	0.38s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 1.04s of the 1.04s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.3 GB
/home/ci/opt/venv/lib/python3.12/site-packages/sklearn/compose/_column_transformer.py:975: FutureWarning: The parameter `force_int_remainder_cols` is deprecated and will be removed in 1.9. It has no effect. Leave it to its default value to avoid this warning.
  warnings.warn(
	Time limit exceeded... Skipping NeuralNetTorch.
Fitting model: WeightedEnsemble_L2 ... Training model for up to 4.92s of the -0.24s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/28.2 GB
	Ensemble Weights: {'CatBoost': 0.733, 'XGBoost': 0.267}
	-1.6213	 = Validation score   (-mean_absolute_error)
	0.05s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 5.31s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 9506.6 rows/s (100 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_education-num")
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.20 GB / 30.95 GB (91.1%)
Disk Space Avail:   204.63 GB / 255.99 GB (79.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'     : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
	presets='high'     : Strong accuracy with fast inference speed.
	presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
	presets='good'     : Good accuracy with very fast inference speed.
	presets='medium'   : Fast training time, ideal for initial prototyping.
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 5s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_education"
Train Data Rows:    500
Train Data Columns: 13
Label Column:       education
Problem Type:       multiclass
Preprocessing data ...
Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 11 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.976
Train Data Class Count: 11
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28878.29 MB
	Train Data (Original)  Memory Usage: 0.22 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 7 | ['workclass', 'marital-status', 'occupation', 'relationship', 'race', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 6 | ['workclass', 'marital-status', 'occupation', 'relationship', 'race', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 390, Val Rows: 98
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
Fitting 11 L1 models, fit_strategy="sequential" ...
Fitting model: NeuralNetFastAI ... Training model for up to 4.91s of the 4.91s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.2 GB
	0.7653	 = Validation score   (accuracy)
	0.51s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMXT ... Training model for up to 4.39s of the 4.38s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.2/28.2 GB
	0.9694	 = Validation score   (accuracy)
	0.89s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 3.46s of the 3.46s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.2/28.2 GB
	1.0	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForestGini ... Training model for up to 2.92s of the 2.92s of remaining time.
	Fitting with cpus=8, gpus=0, mem=0.0/28.2 GB
	0.9082	 = Validation score   (accuracy)
	0.8s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: RandomForestEntr ... Training model for up to 2.03s of the 2.03s of remaining time.
	Fitting with cpus=8, gpus=0, mem=0.0/28.1 GB
	0.9082	 = Validation score   (accuracy)
	0.79s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 1.16s of the 1.16s of remaining time.
	Fitting with cpus=4, gpus=0
	Ran out of time, early stopping on iteration 62.
	0.8367	 = Validation score   (accuracy)
	1.14s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 4.91s of the -0.00s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/28.1 GB
	Ensemble Weights: {'LightGBM': 1.0}
	1.0	 = Validation score   (accuracy)
	0.05s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 5.08s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 24151.9 rows/s (98 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_education")
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.12 GB / 30.95 GB (90.9%)
Disk Space Avail:   204.62 GB / 255.99 GB (79.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'     : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
	presets='high'     : Strong accuracy with fast inference speed.
	presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
	presets='good'     : Good accuracy with very fast inference speed.
	presets='medium'   : Fast training time, ideal for initial prototyping.
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 5s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_class"
Train Data Rows:    500
Train Data Columns: 14
Label Column:       class
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28793.47 MB
	Train Data (Original)  Memory Usage: 0.25 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.08s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
Fitting 11 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT ... Training model for up to 4.92s of the 4.92s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.1 GB
	0.83	 = Validation score   (accuracy)
	0.27s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 4.64s of the 4.64s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.1 GB
	0.85	 = Validation score   (accuracy)
	0.32s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForestGini ... Training model for up to 4.31s of the 4.30s of remaining time.
	Fitting with cpus=8, gpus=0
	0.84	 = Validation score   (accuracy)
	0.54s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: RandomForestEntr ... Training model for up to 3.70s of the 3.70s of remaining time.
	Fitting with cpus=8, gpus=0
	0.83	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 3.12s of the 3.12s of remaining time.
	Fitting with cpus=4, gpus=0
	0.85	 = Validation score   (accuracy)
	0.77s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesGini ... Training model for up to 2.34s of the 2.34s of remaining time.
	Fitting with cpus=8, gpus=0
	0.82	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: ExtraTreesEntr ... Training model for up to 1.75s of the 1.75s of remaining time.
	Fitting with cpus=8, gpus=0
	0.81	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: NeuralNetFastAI ... Training model for up to 1.16s of the 1.15s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.1 GB
	0.83	 = Validation score   (accuracy)
	0.52s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: XGBoost ... Training model for up to 0.62s of the 0.62s of remaining time.
	Fitting with cpus=4, gpus=0
	0.85	 = Validation score   (accuracy)
	0.2s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 0.40s of the 0.40s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/28.1 GB
/home/ci/opt/venv/lib/python3.12/site-packages/sklearn/compose/_column_transformer.py:975: FutureWarning: The parameter `force_int_remainder_cols` is deprecated and will be removed in 1.9. It has no effect. Leave it to its default value to avoid this warning.
  warnings.warn(
	Ran out of time, stopping training early. (Stopping on epoch 8)
	0.83	 = Validation score   (accuracy)
	0.39s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 4.92s of the -0.01s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/28.1 GB
	Ensemble Weights: {'LightGBM': 1.0}
	0.85	 = Validation score   (accuracy)
	0.08s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 5.11s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 20257.4 rows/s (100 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (100 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/agModels-predictEducationClass/Predictor_class")

Inference and Evaluation¶

After training, you can easily use the MultilabelPredictor to predict all labels in new data:

test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
test_data = test_data.sample(n=subsample_size, random_state=0)
test_data_nolab = test_data.drop(columns=labels)  # unnecessary, just to demonstrate we're not cheating here
test_data_nolab.head()

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769

	age	workclass	fnlwgt	marital-status	occupation	relationship	race	sex	hours-per-week	native-country
5454	41	Self-emp-not-inc	408498	Married-civ-spouse	Exec-managerial	Husband	White	Male	50	United-States
6111	39	Private	746786	Married-civ-spouse	Prof-specialty	Husband	White	Male	55	United-States
5282	50	Private	62593	Married-civ-spouse	Farming-fishing	Husband	Asian-Pac-Islander	Male	40	United-States
3046	31	Private	248178	Married-civ-spouse	Other-service	Husband	Black	Male	35	United-States
2162	43	State-gov	52849	Married-civ-spouse	Prof-specialty	Husband	White	Male	40	United-States

multi_predictor = MultilabelPredictor.load(save_path)  # unnecessary, just demonstrates how to load previously-trained multilabel predictor from file

predictions = multi_predictor.predict(test_data_nolab)
print("Predictions:  \n", predictions)

Predicting with TabularPredictor for label: education-num ...
Predicting with TabularPredictor for label: education ...
Predicting with TabularPredictor for label: class ...
Predictions:  
       education-num      education   class
5454      11.593187      Assoc-voc    >50K
6111      13.331231      Bachelors    >50K
5282       9.070158        HS-grad   <=50K
3046       9.358667        HS-grad   <=50K
2162      13.271032      Bachelors    >50K
...             ...            ...     ...
6965       9.757959        HS-grad    >50K
4762       9.252796        HS-grad   <=50K
234       10.271783   Some-college   <=50K
6291      10.273993   Some-college   <=50K
9575       9.511233        HS-grad    >50K

[500 rows x 3 columns]

We can also easily evaluate the performance of our predictions if our new data contain the ground truth labels:

evaluations = multi_predictor.evaluate(test_data)
print(evaluations)
print("Evaluated using metrics:", multi_predictor.eval_metrics)

Evaluating TabularPredictor for label: education-num ...
Evaluating TabularPredictor for label: education ...
Evaluating TabularPredictor for label: class ...
{'education-num': {'mean_absolute_error': -1.6142611503601074, 'root_mean_squared_error': np.float64(-2.231633185486142), 'mean_squared_error': -4.980186462402344, 'r2': 0.3560405969619751, 'pearsonr': 0.610998788828752, 'median_absolute_error': -1.0376033782958984}, 'education': {'accuracy': 0.294, 'balanced_accuracy': np.float64(0.09127144735021464), 'mcc': 0.0995221794945934}, 'class': {'accuracy': 0.834, 'balanced_accuracy': np.float64(0.7263315154934287), 'mcc': 0.5297494673413512, 'roc_auc': np.float64(0.8554817275747509), 'f1': 0.6103286384976526, 'precision': 0.7738095238095238, 'recall': 0.5038759689922481}}
Evaluated using metrics: {'education-num': 'mean_absolute_error', 'education': 'accuracy', 'class': 'accuracy'}

Accessing the TabularPredictor for One Label¶

We can also directly work with the TabularPredictor for any one of the labels as follows. However we recommend you set consider_labels_correlation=False before training if you later plan to use an individual TabularPredictor to predict just one label rather than all of the labels predicted by the MultilabelPredictor.

predictor_class = multi_predictor.get_predictor('class')
predictor_class.leaderboard()

	model	score_val	eval_metric	pred_time_val	fit_time	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	CatBoost	0.85	accuracy	0.003810	0.766261	0.003810	0.766261	1	True	5
1	LightGBM	0.85	accuracy	0.004075	0.320986	0.004075	0.320986	1	True	2
2	WeightedEnsemble_L2	0.85	accuracy	0.004936	0.401831	0.000861	0.080845	2	True	11
3	XGBoost	0.85	accuracy	0.006437	0.203405	0.006437	0.203405	1	True	9
4	RandomForestGini	0.84	accuracy	0.046834	0.538384	0.046834	0.538384	1	True	3
5	LightGBMXT	0.83	accuracy	0.003510	0.270999	0.003510	0.270999	1	True	1
6	NeuralNetFastAI	0.83	accuracy	0.008444	0.515393	0.008444	0.515393	1	True	8
7	NeuralNetTorch	0.83	accuracy	0.010612	0.387838	0.010612	0.387838	1	True	10
8	RandomForestEntr	0.83	accuracy	0.046681	0.526842	0.046681	0.526842	1	True	4
9	ExtraTreesGini	0.82	accuracy	0.047910	0.526231	0.047910	0.526231	1	True	6
10	ExtraTreesEntr	0.81	accuracy	0.048587	0.533829	0.048587	0.533829	1	True	7

Tips¶

In order to obtain the best predictions, you should generally add the following arguments to MultilabelPredictor.fit():

Specify eval_metrics to the metrics you will use to evaluate predictions for each label
Specify presets='best_quality' to tell AutoGluon you care about predictive performance more than latency/memory usage, which will utilize stack ensembling when predicting each label.

If you find that too much memory/disk is being used, try calling MultilabelPredictor.fit() with additional arguments discussed under “If you encounter memory issues” in the In Depth Tutorial or “If you encounter disk space issues”.

If you find inference too slow, you can try the strategies discussed under “Accelerating Inference” in the In Depth Tutorial. In particular, simply try specifying the following preset in MultilabelPredictor.fit(): presets = ['good_quality', 'optimize_for_deployment']