AutoGluon Tabular - Essential Functionality

Open In Colab Open In SageMaker Studio Lab

This tutorial demonstrates how to use AutoGluon to produce a highly accurate tabular model in 3 lines of code.

TabularPredictor

To start, import AutoGluon’s TabularPredictor and TabularDataset classes:

from autogluon.tabular import TabularDataset, TabularPredictor

Load training data from a CSV file using AutoGluon’s TabularDataset. TabularDataset is a convenience wrapper around a Pandas DataFrame and the same methods can be applied to both.

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')  # returns a pandas DataFrame, also works with parquet files
subsample_size = 1000  # subsample data for a faster demo
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
6118 51 Private 39264 Some-college 10 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States >50K
23204 58 Private 51662 10th 6 Married-civ-spouse Other-service Wife White Female 0 0 8 United-States <=50K
29590 40 Private 326310 Some-college 10 Married-civ-spouse Craft-repair Husband White Male 0 0 44 United-States <=50K
18116 37 Private 222450 HS-grad 9 Never-married Sales Not-in-family White Male 0 2339 40 El-Salvador <=50K
33964 62 Private 109190 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 15024 0 40 United-States >50K

Note that we loaded data from a CSV file stored in the cloud. You can also specify a local file instead to try AutoGluon on your own data. Each row in the table train_data corresponds to a single training example. In this particular dataset, each row corresponds to an individual person, and the columns contain various characteristics reported during a census.

Let’s use these features to predict whether a person’s income exceeds $50,000 or not, indicated by the class column.

label = 'class'
print(f"Unique classes: {list(train_data[label].unique())}")
Unique classes: [' >50K', ' <=50K']

AutoGluon works with raw data, meaning you don’t need to perform any data preprocessing before fitting AutoGluon. We actively recommend that you avoid performing operations such as missing value imputation or one-hot-encoding, as AutoGluon has dedicated logic to handle these situations automatically. You can learn more about AutoGluon’s preprocessing in the Feature Engineering Tutorial.

Training

Now we initialize and fit AutoGluon’s TabularPredictor in one line of code:

predictor = TabularPredictor(label=label).fit(train_data)

Hide code cell output

No path specified. Models will be saved in: "AutogluonModels/ag-20251219_141332"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.50 GB / 30.95 GB (92.1%)
Disk Space Avail:   204.60 GB / 255.99 GB (79.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'     : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
	presets='high'     : Strong accuracy with fast inference speed.
	presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
	presets='good'     : Good accuracy with very fast inference speed.
	presets='medium'   : Fast training time, ideal for initial prototyping.
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141332"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29158.58 MB
	Train Data (Original)  Memory Usage: 0.50 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
Fitting 11 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT ...
	Fitting with cpus=4, gpus=0, mem=0.0/28.5 GB
	0.85	 = Validation score   (accuracy)
	0.4s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBM ...
	Fitting with cpus=4, gpus=0, mem=0.0/28.5 GB
	0.84	 = Validation score   (accuracy)
	0.35s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForestGini ...
	Fitting with cpus=8, gpus=0
	0.84	 = Validation score   (accuracy)
	0.67s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: RandomForestEntr ...
	Fitting with cpus=8, gpus=0
	0.835	 = Validation score   (accuracy)
	0.63s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: CatBoost ...
	Fitting with cpus=4, gpus=0
	0.86	 = Validation score   (accuracy)
	1.96s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	Fitting with cpus=8, gpus=0
	0.815	 = Validation score   (accuracy)
	0.63s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	Fitting with cpus=8, gpus=0
	0.82	 = Validation score   (accuracy)
	0.62s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
	Fitting with cpus=4, gpus=0, mem=0.0/28.4 GB
No improvement since epoch 7: early stopping
	0.85	 = Validation score   (accuracy)
	1.42s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: XGBoost ...
	Fitting with cpus=4, gpus=0
	0.855	 = Validation score   (accuracy)
	0.39s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	Fitting with cpus=4, gpus=0, mem=0.0/28.2 GB
/home/ci/opt/venv/lib/python3.12/site-packages/sklearn/compose/_column_transformer.py:975: FutureWarning: The parameter `force_int_remainder_cols` is deprecated and will be removed in 1.9. It has no effect. Leave it to its default value to avoid this warning.
  warnings.warn(
	0.855	 = Validation score   (accuracy)
	4.32s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMLarge ...
	Fitting with cpus=4, gpus=0, mem=0.1/28.2 GB
	0.795	 = Validation score   (accuracy)
	1.07s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/28.2 GB
	Ensemble Weights: {'RandomForestGini': 0.429, 'CatBoost': 0.286, 'LightGBMXT': 0.143, 'ExtraTreesEntr': 0.143}
	0.875	 = Validation score   (accuracy)
	0.08s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 13.09s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 1747.1 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141332")

That’s it! We now have a TabularPredictor that is able to make predictions on new data.

Prediction

Next, load test data to make predictions on new examples:

test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
test_data.head()
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
0 31 Private 169085 11th 7 Married-civ-spouse Sales Wife White Female 0 0 20 United-States <=50K
1 17 Self-emp-not-inc 226203 12th 8 Never-married Sales Own-child White Male 0 0 45 United-States <=50K
2 47 Private 54260 Assoc-voc 11 Married-civ-spouse Exec-managerial Husband White Male 0 1887 60 United-States >50K
3 21 Private 176262 Some-college 10 Never-married Exec-managerial Own-child White Female 0 0 30 United-States <=50K
4 17 Private 241185 12th 8 Never-married Prof-specialty Own-child White Male 0 0 20 United-States <=50K

We can now use our trained models to make predictions on the new data:

y_pred = predictor.predict(test_data)
y_pred.head()  # Predictions
0     <=50K
1     <=50K
2      >50K
3     <=50K
4     <=50K
Name: class, dtype: object
y_pred_proba = predictor.predict_proba(test_data)
y_pred_proba.head()  # Prediction Probabilities
<=50K >50K
0 0.901013 0.098987
1 0.985547 0.014453
2 0.329275 0.670725
3 0.983350 0.016650
4 0.984581 0.015419

Evaluation

Next, we can evaluate the predictor on the (labeled) test data:

predictor.evaluate(test_data)
{'accuracy': 0.8512642030914116,
 'balanced_accuracy': np.float64(0.7544925958019197),
 'mcc': 0.5610580280735463,
 'roc_auc': np.float64(0.9029808670023504),
 'f1': 0.6453502562850867,
 'precision': 0.7431141090500281,
 'recall': 0.5703192407247627}

We can also evaluate each model individually:

predictor.leaderboard(test_data)
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 CatBoost 0.852902 0.860 accuracy 0.011744 0.004883 1.962697 0.011744 0.004883 1.962697 1 True 5
1 WeightedEnsemble_L2 0.851264 0.875 accuracy 0.242066 0.114477 3.727217 0.003195 0.000901 0.080681 2 True 12
2 LightGBMXT 0.850752 0.850 accuracy 0.021172 0.003587 0.395491 0.021172 0.003587 0.395491 1 True 1
3 NeuralNetFastAI 0.846965 0.850 accuracy 0.146094 0.009487 1.418253 0.146094 0.009487 1.418253 1 True 8
4 XGBoost 0.846658 0.855 accuracy 0.025736 0.006300 0.393025 0.025736 0.006300 0.393025 1 True 9
5 LightGBM 0.841335 0.840 accuracy 0.013648 0.003526 0.350763 0.013648 0.003526 0.350763 1 True 2
6 RandomForestGini 0.840004 0.840 accuracy 0.099617 0.047154 0.666806 0.099617 0.047154 0.666806 1 True 3
7 RandomForestEntr 0.837240 0.835 accuracy 0.099565 0.047460 0.632190 0.099565 0.047460 0.632190 1 True 4
8 NeuralNetTorch 0.836728 0.855 accuracy 0.050671 0.012009 4.323106 0.050671 0.012009 4.323106 1 True 10
9 ExtraTreesGini 0.831917 0.815 accuracy 0.101458 0.057775 0.629977 0.101458 0.057775 0.629977 1 True 6
10 LightGBMLarge 0.829461 0.795 accuracy 0.067812 0.004716 1.069197 0.067812 0.004716 1.069197 1 True 11
11 ExtraTreesEntr 0.829358 0.820 accuracy 0.106337 0.057951 0.621542 0.106337 0.057951 0.621542 1 True 7

Loading a Trained Predictor

Finally, we can load the predictor in a new session (or new machine) by calling TabularPredictor.load() and specifying the location of the predictor artifact on disk.

predictor.path  # The path on disk where the predictor is saved
'/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141332'
# Load the predictor by specifying the path it is saved to on disk.
# You can control where it is saved to by setting the `path` parameter during init
predictor = TabularPredictor.load(predictor.path)

Warning

TabularPredictor.load() uses the pickle module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. Only load data you trust.

Now you’re ready to try AutoGluon on your own tabular datasets! Achieve strong predictive performance with just 2 lines of code:

from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label="your_label_name").fit(train_data="train_data.csv")

Note: This simple call to TabularPredictor.fit() is intended for your first prototype model. In a subsequent section, we’ll demonstrate how to maximize predictive performance by additionally specifying the presets parameter to fit() and the eval_metric parameter to TabularPredictor().

Description of fit()

Here we discuss what happened during fit().

Since there are only two possible values of the class variable, this was a binary classification problem, for which an appropriate performance metric is accuracy. AutoGluon automatically infers this as well as the type of each feature (i.e., which columns contain continuous numbers vs. discrete categories). AutoGluon can also automatically handle common issues like missing data and rescaling feature values.

We did not specify separate validation data and so AutoGluon automatically chose a random training/validation split of the data. The data used for validation is separated from the training data and is used to determine the models and hyperparameter-values that produce the best results. Rather than just a single model, AutoGluon trains multiple models and ensembles them together to obtain superior predictive performance.

By default, AutoGluon tries to fit various types of models including neural networks and tree ensembles. Each type of model has various hyperparameters, which traditionally, the user would have to specify. AutoGluon automates this process.

AutoGluon automatically and iteratively tests values for hyperparameters to produce the best performance on the validation data. This involves repeatedly training models under different hyperparameter settings and evaluating their performance. This process can be computationally-intensive, so fit() parallelizes this process across multiple threads using Ray. To control runtimes, you can specify various arguments in fit() such as time_limit as demonstrated in the subsequent In-Depth Tutorial.

We can view what properties AutoGluon automatically inferred about our prediction task:

print("AutoGluon infers problem type is: ", predictor.problem_type)
print("AutoGluon identified the following types of features:")
print(predictor.feature_metadata)
AutoGluon infers problem type is:  binary
AutoGluon identified the following types of features:
('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']

AutoGluon correctly recognized our prediction problem to be a binary classification task and decided that variables such as age should be represented as integers, whereas variables such as workclass should be represented as categorical objects. The feature_metadata attribute allows you to see the inferred data type of each predictive variable after preprocessing (this is its raw dtype; some features may also be associated with additional special dtypes if produced via feature-engineering, e.g. numerical representations of a datetime/text column).

To transform the data into AutoGluon’s internal representation, we can do the following:

test_data_transform = predictor.transform_features(test_data)
test_data_transform.head()
age fnlwgt education-num sex capital-gain capital-loss hours-per-week workclass education marital-status occupation relationship race native-country
0 31 169085 7 0 0 0 20 3 1 1 12 5 4 24
1 17 226203 8 1 0 0 45 5 2 3 12 3 4 24
2 47 54260 11 1 0 1887 60 3 8 1 4 0 4 24
3 21 176262 10 0 0 0 30 3 14 3 4 3 4 24
4 17 241185 8 1 0 0 20 3 2 3 10 3 4 24

Notice how the data is purely numeric after pre-processing (although categorical features will still be treated as categorical downstream).

To better understand our trained predictor, we can estimate the overall importance of each feature via TabularPredictor.feature_importance():

predictor.feature_importance(test_data)
Computing feature importance via permutation shuffling for 14 features using 5000 rows with 5 shuffle sets...
	16.5s	= Expected runtime (3.3s per shuffle set)
	5.2s	= Actual runtime (Completed 5 of 5 shuffle sets)
importance stddev p_value n p99_high p99_low
capital-gain 0.03044 0.002422 0.000005 5 0.035428 0.025452
education-num 0.01776 0.005663 0.001089 5 0.029420 0.006100
marital-status 0.01468 0.003408 0.000325 5 0.021696 0.007664
age 0.01304 0.004498 0.001459 5 0.022301 0.003779
relationship 0.01296 0.002875 0.000273 5 0.018881 0.007039
occupation 0.00992 0.002456 0.000416 5 0.014977 0.004863
hours-per-week 0.00660 0.004942 0.020240 5 0.016775 -0.003575
capital-loss 0.00364 0.001081 0.000832 5 0.005865 0.001415
native-country 0.00112 0.001154 0.047907 5 0.003496 -0.001256
workclass 0.00072 0.002484 0.276139 5 0.005835 -0.004395
race 0.00072 0.001346 0.148869 5 0.003492 -0.002052
sex 0.00040 0.001594 0.302301 5 0.003682 -0.002882
fnlwgt 0.00016 0.002621 0.449004 5 0.005556 -0.005236
education -0.00056 0.001841 0.733174 5 0.003230 -0.004350

The importance column is an estimate for the amount the evaluation metric score would drop if the feature were removed from the data. Negative values of importance mean that it is likely to improve the results if re-fit with the feature removed.

When we call predict(), AutoGluon automatically predicts with the model that displayed the best performance on validation data (i.e. the weighted-ensemble).

predictor.model_best
'WeightedEnsemble_L2'

We can instead specify which model to use for predictions like this:

predictor.predict(test_data, model='LightGBM')

You can get the list of trained models via .leaderboard() or .model_names():

predictor.model_names()
['LightGBMXT',
 'LightGBM',
 'RandomForestGini',
 'RandomForestEntr',
 'CatBoost',
 'ExtraTreesGini',
 'ExtraTreesEntr',
 'NeuralNetFastAI',
 'XGBoost',
 'NeuralNetTorch',
 'LightGBMLarge',
 'WeightedEnsemble_L2']

The scores of predictive performance above were based on a default evaluation metric (accuracy for binary classification). Performance in certain applications may be measured by different metrics than the ones AutoGluon optimizes for by default. If you know the metric that counts in your application, you should specify it via the eval_metric argument as demonstrated in the next section.

Presets

AutoGluon comes with a variety of presets that can be specified in the call to .fit via the presets argument. medium is used by default to encourage initial prototyping, but for serious usage, the other presets should be used instead.

Preset

Model Quality

Use Cases

Fit Time (Ideal)

Inference Time (Relative to medium_quality)

Disk Usage

extreme

Far better than best on datasets <30000 samples

(New in v1.4) The absolute cutting edge. Incorporates very recent tabular foundation models TabPFNv2, TabICL, and Mitra, along with the deep learning model TabM. Requires a GPU for best results.

4x+

32x+

8x+

best

State-of-the-art (SOTA), much better than high

When accuracy is what matters. This should be considered the preferred setting for serious usage. Has been used to win numerous Kaggle competitions.

16x+

32x+

16x+

high

Better than good

When a very powerful, portable solution with fast inference is required: Large-scale batch inference

16x+

4x

2x

good

Stronger than any other AutoML Framework

When a powerful, highly portable solution with very fast inference is required: Billion-scale batch inference, sub-100ms online-inference, edge-devices

16x

2x

0.1x

medium

Competitive with other top AutoML Frameworks

Initial prototyping, establishing a performance baseline

1x

1x

1x

We recommend users to start with best to get a strong performance baseline. If best is taking too long to train, consider running medium or subsampling the training data during this prototyping phase. Make sure to consider holding out test data that AutoGluon never sees during training to ensure that the models are performing as expected in terms of performance.
Once you evaluate both best and medium, check if either satisfies your needs. If neither do, consider trying high and/or good.

If you have a GPU, we recommend trying the new extreme preset, which is meta-learned from TabArena: https://tabarena.ai and demonstrates the absolute cutting edge performance, dramatically improving over best on small datasets. Ensure you have installed the required dependencies via pip install autogluon[tabarena].

If none of the presets satisfy requirements, refer to Predicting Columns in a Table - In Depth for more advanced AutoGluon options.

Maximizing predictive performance

Note: You should not call fit() with entirely default arguments if you are benchmarking AutoGluon-Tabular or hoping to maximize its accuracy! To get the best predictive accuracy with AutoGluon, you should generally use it like this:

time_limit = 60  # for quick demonstration only, you should set this to longest time you are willing to wait (in seconds)
metric = 'roc_auc'  # specify your evaluation metric here
predictor = TabularPredictor(label, eval_metric=metric).fit(train_data, time_limit=time_limit, presets='best')

Hide code cell output

No path specified. Models will be saved in: "AutogluonModels/ag-20251219_141355"
Preset alias specified: 'best' maps to 'best_quality'.
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.12 GB / 30.95 GB (90.9%)
Disk Space Avail:   204.56 GB / 255.99 GB (79.9%)
===================================================
Presets specified: ['best']
Using hyperparameters preset: hyperparameters='zeroshot'
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
	Running DyStack for up to 15s of the 60s of remaining time (25%).
DyStack: Disabling memory safe fit mode in DyStack because GPUs were detected and num_gpus='auto' (GPUs cannot be used in memory safe fit mode). If you want to use memory safe fit mode, manually set `num_gpus=0`.
Running DyStack sub-fit ...
Beginning AutoGluon training ... Time limit = 15s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141355/ds_sub_fit/sub_fit_ho"
Train Data Rows:    888
Train Data Columns: 14
Label Column:       class
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28796.37 MB
	Train Data (Original)  Memory Usage: 0.44 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.05 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
	To change this, specify the eval_metric parameter of Predictor()
Large model count detected (110 configs) ... Only displaying the first 3 models of each family. To see all, set `verbosity=3`.
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}, {'activation': 'elu', 'dropout_prob': 0.10077639529843717, 'hidden_size': 108, 'learning_rate': 0.002735937344002146, 'num_layers': 4, 'use_batchnorm': True, 'weight_decay': 1.356433327634438e-12, 'ag_args': {'name_suffix': '_r79', 'priority': -2}}, {'activation': 'elu', 'dropout_prob': 0.11897478034205347, 'hidden_size': 213, 'learning_rate': 0.0010474382260641949, 'num_layers': 4, 'use_batchnorm': False, 'weight_decay': 5.594471067786272e-10, 'ag_args': {'name_suffix': '_r22', 'priority': -7}}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}, {'depth': 6, 'grow_policy': 'SymmetricTree', 'l2_leaf_reg': 2.1542798306067823, 'learning_rate': 0.06864209415792857, 'max_ctr_complexity': 4, 'one_hot_max_size': 10, 'ag_args': {'name_suffix': '_r177', 'priority': -1}}, {'depth': 8, 'grow_policy': 'Depthwise', 'l2_leaf_reg': 2.7997999596449104, 'learning_rate': 0.031375015734637225, 'max_ctr_complexity': 2, 'one_hot_max_size': 3, 'ag_args': {'name_suffix': '_r9', 'priority': -5}}],
	'XGB': [{}, {'colsample_bytree': 0.6917311125174739, 'enable_categorical': False, 'learning_rate': 0.018063876087523967, 'max_depth': 10, 'min_child_weight': 0.6028633586934382, 'ag_args': {'name_suffix': '_r33', 'priority': -8}}, {'colsample_bytree': 0.6628423832084077, 'enable_categorical': False, 'learning_rate': 0.08775715546881824, 'max_depth': 5, 'min_child_weight': 0.6294123374222513, 'ag_args': {'name_suffix': '_r89', 'priority': -16}}],
	'FASTAI': [{}, {'bs': 256, 'emb_drop': 0.5411770367537934, 'epochs': 43, 'layers': [800, 400], 'lr': 0.01519848858318159, 'ps': 0.23782946566604385, 'ag_args': {'name_suffix': '_r191', 'priority': -4}}, {'bs': 2048, 'emb_drop': 0.05070411322605811, 'epochs': 29, 'layers': [200, 100], 'lr': 0.08974235041576624, 'ps': 0.10393466140748028, 'ag_args': {'name_suffix': '_r102', 'priority': -11}}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 108 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 9.94s of the 14.90s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.06%)
/home/ci/opt/venv/lib/python3.12/site-packages/ray/_private/worker.py:2062: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
	0.902	 = Validation score   (roc_auc)
	0.84s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 2.37s of the 7.33s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.07%)
	0.8923	 = Validation score   (roc_auc)
	0.91s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 14.91s of the 2.58s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/25.9 GB
	Ensemble Weights: {'LightGBMXT_BAG_L1': 1.0}
	0.902	 = Validation score   (roc_auc)
	0.01s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 108 L2 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 2.57s of the 2.54s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.07%)
	0.9014	 = Validation score   (roc_auc)
	1.59s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: WeightedEnsemble_L3 ... Training model for up to 14.91s of the -2.74s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/27.0 GB
	Ensemble Weights: {'LightGBMXT_BAG_L1': 1.0}
	0.902	 = Validation score   (roc_auc)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 17.76s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 1982.2 rows/s (111 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141355/ds_sub_fit/sub_fit_ho")
Deleting DyStack predictor artifacts (clean_up_fits=True) ...
Leaderboard on holdout data (DyStack):
                 model  score_holdout  score_val eval_metric  pred_time_test  pred_time_val  fit_time  pred_time_test_marginal  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0    LightGBMXT_BAG_L1       0.905692   0.902037     roc_auc        0.048339       0.055887  0.835064                 0.048339                0.055887           0.835064            1       True          1
1  WeightedEnsemble_L2       0.905692   0.902037     roc_auc        0.049613       0.056771  0.840586                 0.001274                0.000884           0.005522            2       True          3
2  WeightedEnsemble_L3       0.905692   0.902037     roc_auc        0.049636       0.056549  0.839011                 0.001297                0.000663           0.003947            3       True          5
3    LightGBMXT_BAG_L2       0.902784   0.901398     roc_auc        0.097912       0.121060  2.422401                 0.049573                0.065173           1.587337            2       True          4
4      LightGBM_BAG_L1       0.893644   0.892280     roc_auc        0.037646       0.047588  0.906218                 0.037646                0.047588           0.906218            1       True          2
	1	 = Optimal   num_stack_levels (Stacked Overfitting Occurred: False)
	18s	 = DyStack   runtime |	42s	 = Remaining runtime
Starting main fit with num_stack_levels=1.
	For future fit calls on this dataset, you can skip DyStack to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)`
Beginning AutoGluon training ... Time limit = 42s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141355"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    27831.42 MB
	Train Data (Original)  Memory Usage: 0.50 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
	To change this, specify the eval_metric parameter of Predictor()
Large model count detected (110 configs) ... Only displaying the first 3 models of each family. To see all, set `verbosity=3`.
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}, {'activation': 'elu', 'dropout_prob': 0.10077639529843717, 'hidden_size': 108, 'learning_rate': 0.002735937344002146, 'num_layers': 4, 'use_batchnorm': True, 'weight_decay': 1.356433327634438e-12, 'ag_args': {'name_suffix': '_r79', 'priority': -2}}, {'activation': 'elu', 'dropout_prob': 0.11897478034205347, 'hidden_size': 213, 'learning_rate': 0.0010474382260641949, 'num_layers': 4, 'use_batchnorm': False, 'weight_decay': 5.594471067786272e-10, 'ag_args': {'name_suffix': '_r22', 'priority': -7}}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}, {'depth': 6, 'grow_policy': 'SymmetricTree', 'l2_leaf_reg': 2.1542798306067823, 'learning_rate': 0.06864209415792857, 'max_ctr_complexity': 4, 'one_hot_max_size': 10, 'ag_args': {'name_suffix': '_r177', 'priority': -1}}, {'depth': 8, 'grow_policy': 'Depthwise', 'l2_leaf_reg': 2.7997999596449104, 'learning_rate': 0.031375015734637225, 'max_ctr_complexity': 2, 'one_hot_max_size': 3, 'ag_args': {'name_suffix': '_r9', 'priority': -5}}],
	'XGB': [{}, {'colsample_bytree': 0.6917311125174739, 'enable_categorical': False, 'learning_rate': 0.018063876087523967, 'max_depth': 10, 'min_child_weight': 0.6028633586934382, 'ag_args': {'name_suffix': '_r33', 'priority': -8}}, {'colsample_bytree': 0.6628423832084077, 'enable_categorical': False, 'learning_rate': 0.08775715546881824, 'max_depth': 5, 'min_child_weight': 0.6294123374222513, 'ag_args': {'name_suffix': '_r89', 'priority': -16}}],
	'FASTAI': [{}, {'bs': 256, 'emb_drop': 0.5411770367537934, 'epochs': 43, 'layers': [800, 400], 'lr': 0.01519848858318159, 'ps': 0.23782946566604385, 'ag_args': {'name_suffix': '_r191', 'priority': -4}}, {'bs': 2048, 'emb_drop': 0.05070411322605811, 'epochs': 29, 'layers': [200, 100], 'lr': 0.08974235041576624, 'ps': 0.10393466140748028, 'ag_args': {'name_suffix': '_r102', 'priority': -11}}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 108 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 27.61s of the 41.42s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.06%)
	0.895	 = Validation score   (roc_auc)
	0.91s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 23.50s of the 37.30s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.07%)
	0.8798	 = Validation score   (roc_auc)
	0.87s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 19.05s of the 32.85s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8892	 = Validation score   (roc_auc)
	0.88s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 18.02s of the 31.83s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8909	 = Validation score   (roc_auc)
	0.64s	 = Training   runtime
	0.13s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 17.23s of the 31.04s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=1.42%)
	0.8993	 = Validation score   (roc_auc)
	5.62s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: ExtraTreesGini_BAG_L1 ... Training model for up to 8.41s of the 22.22s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8862	 = Validation score   (roc_auc)
	0.71s	 = Training   runtime
	0.13s	 = Validation runtime
Fitting model: ExtraTreesEntr_BAG_L1 ... Training model for up to 7.54s of the 21.35s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8816	 = Validation score   (roc_auc)
	0.63s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 6.77s of the 20.57s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.00%)
	0.8879	 = Validation score   (roc_auc)
	5.83s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 41.43s of the 11.38s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/23.8 GB
	Ensemble Weights: {'NeuralNetFastAI_BAG_L1': 0.312, 'CatBoost_BAG_L1': 0.25, 'RandomForestEntr_BAG_L1': 0.188, 'LightGBMXT_BAG_L1': 0.125, 'RandomForestGini_BAG_L1': 0.062, 'ExtraTreesGini_BAG_L1': 0.062}
	0.9048	 = Validation score   (roc_auc)
	0.11s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 108 L2 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 11.25s of the 11.21s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.08%)
	0.9	 = Validation score   (roc_auc)
	1.1s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 6.44s of the 6.40s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.07%)
	0.8902	 = Validation score   (roc_auc)
	1.25s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: RandomForestGini_BAG_L2 ... Training model for up to 1.58s of the 1.54s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8857	 = Validation score   (roc_auc)
	0.93s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: RandomForestEntr_BAG_L2 ... Training model for up to 0.51s of the 0.47s of remaining time.
	Fitting 1 model on all data (use_child_oof=True) | Fitting with cpus=8, gpus=0
	0.8922	 = Validation score   (roc_auc)
	0.65s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: WeightedEnsemble_L3 ... Training model for up to 41.43s of the -0.47s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/27.1 GB
	Ensemble Weights: {'RandomForestEntr_BAG_L1': 0.304, 'NeuralNetFastAI_BAG_L1': 0.304, 'CatBoost_BAG_L1': 0.261, 'ExtraTreesGini_BAG_L1': 0.087, 'LightGBMXT_BAG_L2': 0.043}
	0.9048	 = Validation score   (roc_auc)
	0.08s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 42.09s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 459.9 rows/s (125 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_141355")
predictor.leaderboard(test_data)
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.907293 0.904843 roc_auc 1.809451 0.600238 14.706942 0.003258 0.000875 0.114003 2 True 9
1 CatBoost_BAG_L1 0.907141 0.899283 roc_auc 0.058805 0.049406 5.619611 0.058805 0.049406 5.619611 1 True 5
2 LightGBMXT_BAG_L1 0.905868 0.894967 roc_auc 0.342303 0.053113 0.910213 0.342303 0.053113 0.910213 1 True 1
3 WeightedEnsemble_L3 0.905835 0.904764 roc_auc 2.189619 0.816378 17.267674 0.003232 0.000640 0.079163 3 True 14
4 LightGBMXT_BAG_L2 0.904148 0.900028 roc_auc 2.186388 0.815737 17.188511 0.128750 0.045119 1.096268 2 True 10
5 LightGBM_BAG_L1 0.900861 0.879843 roc_auc 0.146949 0.046345 0.871582 0.146949 0.046345 0.871582 1 True 2
6 RandomForestEntr_BAG_L2 0.898973 0.892232 roc_auc 2.158126 0.892788 16.743660 0.100488 0.122169 0.651417 2 True 13
7 NeuralNetFastAI_BAG_L1 0.898784 0.887942 roc_auc 1.091405 0.122363 5.832833 1.091405 0.122363 5.832833 1 True 8
8 RandomForestGini_BAG_L2 0.898545 0.885669 roc_auc 2.158872 0.892685 17.021665 0.101234 0.122066 0.929422 2 True 12
9 LightGBM_BAG_L2 0.894099 0.890226 roc_auc 2.175915 0.813793 17.337884 0.118278 0.043174 1.245641 2 True 11
10 RandomForestGini_BAG_L1 0.892143 0.889168 roc_auc 0.104163 0.122897 0.878302 0.104163 0.122897 0.878302 1 True 3
11 RandomForestEntr_BAG_L1 0.891222 0.890945 roc_auc 0.101979 0.126181 0.639168 0.101979 0.126181 0.639168 1 True 4
12 ExtraTreesGini_BAG_L1 0.884830 0.886223 roc_auc 0.107537 0.125403 0.712812 0.107537 0.125403 0.712812 1 True 6
13 ExtraTreesEntr_BAG_L1 0.884371 0.881558 roc_auc 0.104496 0.124911 0.627722 0.104496 0.124911 0.627722 1 True 7

This command implements the following strategy to maximize accuracy:

  • Specify the argument presets='best', which allows AutoGluon to automatically construct powerful model ensembles based on stacking/bagging, and will greatly improve the resulting predictions if granted sufficient training time. The default value of presets is 'medium', which produces less accurate models but facilitates faster prototyping. With presets, you can flexibly prioritize predictive accuracy vs. training/inference speed. For example, if you care less about predictive performance and want to quickly deploy a basic model, consider using: presets=['good', 'optimize_for_deployment'].

  • Provide the parameter eval_metric to TabularPredictor() if you know what metric will be used to evaluate predictions in your application. Some other non-default metrics you might use include things like: 'f1' (for binary classification), 'roc_auc' (for binary classification), 'log_loss' (for classification), 'mean_absolute_error' (for regression), 'median_absolute_error' (for regression). You can also define your own custom metric function. For more information refer to Adding a custom metric to AutoGluon.

  • Include all your data in train_data and do not provide tuning_data (AutoGluon will split the data more intelligently to fit its needs).

  • Do not specify the hyperparameter_tune_kwargs argument (counterintuitively, hyperparameter tuning is not the best way to spend a limited training time budgets, as model ensembling is often superior). We recommend you only use hyperparameter_tune_kwargs if your goal is to deploy a single model rather than an ensemble.

  • Do not specify the hyperparameters argument (allow AutoGluon to adaptively select which models/hyperparameters to use).

  • Set time_limit to the longest amount of time (in seconds) that you are willing to wait. AutoGluon’s predictive performance improves the longer fit() is allowed to run.

Regression (predicting numeric table columns):

To demonstrate that fit() can also automatically handle regression tasks, we now train to predict the numeric age variable in the same table based on the other features:

age_column = 'age'
predictor_age = TabularPredictor(label=age_column, path="agModels-predictAge").fit(train_data, time_limit=30)

Hide code cell output

Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.5.0b20251219
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.9.1+cu128
CUDA Version:       12.8
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       27.10 GB / 30.95 GB (87.6%)
Disk Space Avail:   204.50 GB / 255.99 GB (79.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'     : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
	presets='high'     : Strong accuracy with fast inference speed.
	presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
	presets='good'     : Good accuracy with very fast inference speed.
	presets='medium'   : Fast training time, ideal for initial prototyping.
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/agModels-predictAge"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       age
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	First 10 (of 61) unique label values:  [np.int64(51), np.int64(58), np.int64(40), np.int64(37), np.int64(62), np.int64(65), np.int64(27), np.int64(41), np.int64(26), np.int64(34)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Warning: Updated label_count_threshold from 10 to 5 to avoid cutting too many classes.
Warning: Some classes in the training set have fewer than 5 examples. AutoGluon will only keep 50 out of 61 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 5 examples that will be kept for training models: 0.976
Train Data Class Count: 50
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    27749.04 MB
	Train Data (Original)  Memory Usage: 0.53 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
		('object', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
		('int', ['bool']) : 2 | ['sex', 'class']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.05 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 780, Val Rows: 196
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
}
Fitting 11 L1 models, fit_strategy="sequential" ...
Fitting model: NeuralNetFastAI ... Training model for up to 29.90s of the 29.90s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.0/27.1 GB
No improvement since epoch 9: early stopping
	0.0867	 = Validation score   (accuracy)
	1.13s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMXT ... Training model for up to 28.74s of the 28.74s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.7/27.1 GB
	0.0612	 = Validation score   (accuracy)
	3.54s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 25.19s of the 25.19s of remaining time.
	Fitting with cpus=4, gpus=0, mem=0.7/27.1 GB
	0.051	 = Validation score   (accuracy)
	5.97s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForestGini ... Training model for up to 19.21s of the 19.21s of remaining time.
	Fitting with cpus=8, gpus=0, mem=0.2/27.1 GB
	0.0408	 = Validation score   (accuracy)
	2.84s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: RandomForestEntr ... Training model for up to 16.15s of the 16.15s of remaining time.
	Fitting with cpus=8, gpus=0, mem=0.2/27.1 GB
	0.0357	 = Validation score   (accuracy)
	2.9s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 13.03s of the 13.03s of remaining time.
	Fitting with cpus=4, gpus=0
	Ran out of time, early stopping on iteration 18.
	0.0561	 = Validation score   (accuracy)
	12.23s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesGini ... Training model for up to 0.79s of the 0.79s of remaining time.
	Fitting with cpus=8, gpus=0, mem=0.2/27.0 GB
	Warning: Reducing model 'n_estimators' from 300 -> 60 due to low time. Expected time usage reduced from 3.9s -> 0.8s...
	0.0255	 = Validation score   (accuracy)
	0.59s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.90s of the 0.14s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/27.0 GB
	Ensemble Weights: {'LightGBMXT': 0.636, 'NeuralNetFastAI': 0.364}
	0.0969	 = Validation score   (accuracy)
	0.08s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 29.97s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 13958.9 rows/s (196 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/agModels-predictAge")
predictor_age.leaderboard(test_data)
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.051080 0.096939 accuracy 0.162077 0.014041 4.753744 0.014084 0.000875 0.078792 2 True 8
1 RandomForestGini 0.051080 0.040816 accuracy 0.360499 0.077021 2.844398 0.360499 0.077021 2.844398 1 True 4
2 NeuralNetFastAI 0.050670 0.086735 accuracy 0.136094 0.009499 1.134183 0.136094 0.009499 1.134183 1 True 1
3 RandomForestEntr 0.049852 0.035714 accuracy 0.334034 0.076913 2.896509 0.334034 0.076913 2.896509 1 True 5
4 ExtraTreesGini 0.048828 0.025510 accuracy 0.062661 0.025760 0.593378 0.062661 0.025760 0.593378 1 True 7
5 LightGBM 0.039820 0.051020 accuracy 0.025213 0.003908 5.973645 0.025213 0.003908 5.973645 1 True 3
6 CatBoost 0.039717 0.056122 accuracy 0.025897 0.004594 12.230201 0.025897 0.004594 12.230201 1 True 6
7 LightGBMXT 0.025898 0.061224 accuracy 0.011899 0.003667 3.540769 0.011899 0.003667 3.540769 1 True 2

Note that we didn’t need to tell AutoGluon this is a regression problem, it automatically inferred this from the data and used an appropriate evaluation metric (RMSE by default). To specify a particular evaluation metric other than the default, set the eval_metric parameter of TabularPredictor() and AutoGluon will tailor its models to optimize your metric (e.g. eval_metric = 'mean_absolute_error'). For evaluation metrics where higher values are worse (like RMSE), AutoGluon will flip their sign and print them as negative values during training (as it internally assumes higher values are better). You can even specify a custom metric by following the Custom Metric Tutorial.

Data Formats: AutoGluon can currently operate on data tables already loaded into Python as pandas DataFrames, or those stored in files of CSV format or Parquet format. If your data lives in multiple tables, you will first need to join them into a single table whose rows correspond to statistically independent observations (datapoints) and columns correspond to different features (aka. variables/covariates).

Refer to the TabularPredictor documentation to see all of the available methods/options.

Advanced Usage

For more advanced usage examples of AutoGluon, refer to the In Depth Tutorial

If you are interested in deployment optimization, refer to the Deployment Optimization Tutorial.

For adding custom models to AutoGluon, refer to the Custom Model and Custom Model Advanced tutorials.