AutoGluon Tabular - Foundational Models¶
In this tutorial, we introduce support for cutting-edge foundational tabular models that leverage pre-training and in-context learning to achieve state-of-the-art performance on tabular datasets. These models represent a significant advancement in automated machine learning for structured data.
In this tutorial, we’ll explore three foundational tabular models:
Mitra - AutoGluon’s new state-of-the-art tabular foundation model
TabICL - In-context learning for large tabular datasets
TabPFNv2 - Prior-fitted networks for accurate predictions on small data
These models excel particularly on small to medium-sized datasets and can run in both zero-shot and fine-tuning modes.
Installation¶
First, let’s install AutoGluon with support for foundational models:
# Individual model installations:
!pip install uv
!uv pip install autogluon.tabular[mitra] # For Mitra
!uv pip install autogluon.tabular[tabicl] # For TabICL
!uv pip install autogluon.tabular[tabpfn] # For TabPFNv2
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine, fetch_california_housing
Example Data¶
For this tutorial, we’ll demonstrate the foundational models on three different datasets to showcase their versatility:
Wine Dataset (Multi-class Classification) - Medium-sized dataset for comparing model performance
California Housing (Regression) - Regression dataset
Let’s load and prepare these datasets:
# Load datasets
# 1. Wine (Multi-class Classification)
wine_data = load_wine()
wine_df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)
wine_df['target'] = wine_data.target
# 2. California Housing (Regression)
housing_data = fetch_california_housing()
housing_df = pd.DataFrame(housing_data.data, columns=housing_data.feature_names)
housing_df['target'] = housing_data.target
print("Dataset shapes:")
print(f"Wine: {wine_df.shape}")
print(f"California Housing: {housing_df.shape}")
Dataset shapes:
Wine: (178, 14)
California Housing: (20640, 9)
Create Train/Test Splits¶
Let’s create train/test splits for our datasets:
# Create train/test splits (80/20)
wine_train, wine_test = train_test_split(wine_df, test_size=0.2, random_state=42, stratify=wine_df['target'])
housing_train, housing_test = train_test_split(housing_df, test_size=0.2, random_state=42)
print("Training set sizes:")
print(f"Wine: {len(wine_train)} samples")
print(f"Housing: {len(housing_train)} samples")
# Convert to TabularDataset
wine_train_data = TabularDataset(wine_train)
wine_test_data = TabularDataset(wine_test)
housing_train_data = TabularDataset(housing_train)
housing_test_data = TabularDataset(housing_test)
Training set sizes:
Wine: 142 samples
Housing: 16512 samples
1. Mitra: AutoGluon’s Tabular Foundation Model¶
Mitra is a new state-of-the-art tabular foundation model developed by the AutoGluon team, natively supported in AutoGluon with just three lines of code via predictor.fit()). Built on the in-context learning paradigm and pretrained exclusively on synthetic data, Mitra introduces a principled pretraining approach by carefully selecting and mixing diverse synthetic priors to promote robust generalization across a wide range of real-world tabular datasets.
📊 Mitra achieves state-of-the-art performance on major benchmarks including TabRepo, TabZilla, AMLB, and TabArena, especially excelling on small tabular datasets with fewer than 5,000 samples and 100 features, for both classification and regression tasks.
🧠 Mitra supports both zero-shot and fine-tuning modes and runs seamlessly on both GPU and CPU. Its weights are fully open-sourced under the Apache-2.0 license, making it a privacy-conscious and production-ready solution for enterprises concerned about data sharing and hosting.
🔗 Learn more on Hugging Face:
Classification model: autogluon/mitra-classifier
Regression model: autogluon/mitra-regressor
Using Mitra for Classification¶
# Create predictor with Mitra
print("Training Mitra classifier on classification dataset...")
mitra_predictor = TabularPredictor(label='target')
mitra_predictor.fit(
wine_train_data,
hyperparameters={
'MITRA': {'fine_tune': False}
},
)
print("\nMitra training completed!")
Training Mitra classifier on classification dataset...
Mitra training completed!
No path specified. Models will be saved in: "AutogluonModels/ag-20251219_225049"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.5.0b20251219
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.9.1+cu128
CUDA Version: 12.8
GPU Memory: GPU 0: 14.57/14.57 GB
Total GPU Memory: Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 28.49 GB / 30.95 GB (92.1%)
Disk Space Avail: 204.15 GB / 255.99 GB (79.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='extreme' : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Requires a GPU and `pip install autogluon.tabular[tabarena]` to install TabPFN, TabICL, and TabDPT.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
presets='high' : Strong accuracy with fast inference speed.
presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_225049"
Train Data Rows: 142
Train Data Columns: 13
Label Column: target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
3 unique label values: [np.int64(0), np.int64(2), np.int64(1)]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 29153.93 MB
Train Data (Original) Memory Usage: 0.01 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
0.0s = Fit runtime
13 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
'MITRA': [{'fine_tune': False}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ...
Fitting with cpus=4, gpus=1, mem=7.0/28.5 GB
1.0 = Validation score (accuracy)
5.29s = Training runtime
0.13s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/27.3 GB
Ensemble Weights: {'Mitra': 1.0}
1.0 = Validation score (accuracy)
0.0s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 5.85s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 222.7 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_225049")
Evaluate Mitra Performance¶
# Make predictions
mitra_predictions = mitra_predictor.predict(wine_test_data)
print("Sample Mitra predictions:")
print(mitra_predictions.head(10))
# Show prediction probabilities for first few samples
mitra_predictions = mitra_predictor.predict_proba(wine_test_data)
print(mitra_predictions.head())
# Show model leaderboard
print("\nMitra Model Leaderboard:")
mitra_predictor.leaderboard(wine_test_data)
Sample Mitra predictions:
10 0
134 2
28 0
121 0
62 1
51 0
7 0
66 1
129 1
166 2
Name: target, dtype: int64
0 1 2
10 0.995763 0.004069 0.000168
134 0.001200 0.122376 0.876424
28 0.961425 0.038462 0.000113
121 0.543636 0.450690 0.005673
62 0.140104 0.858239 0.001657
Mitra Model Leaderboard:
| model | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mitra | 0.972222 | 1.0 | accuracy | 0.309376 | 0.129480 | 5.286417 | 0.309376 | 0.129480 | 5.286417 | 1 | True | 1 |
| 1 | WeightedEnsemble_L2 | 0.972222 | 1.0 | accuracy | 0.311730 | 0.130235 | 5.289913 | 0.002354 | 0.000755 | 0.003496 | 2 | True | 2 |
Finetuning with Mitra¶
mitra_predictor_ft = TabularPredictor(label='target')
mitra_predictor_ft.fit(
wine_train_data,
hyperparameters={
'MITRA': {'fine_tune': True, 'fine_tune_steps': 10}
},
time_limit=120, # 2 minutes
)
print("\nMitra fine-tuning completed!")
Mitra fine-tuning completed!
No path specified. Models will be saved in: "AutogluonModels/ag-20251219_225058"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.5.0b20251219
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.9.1+cu128
CUDA Version: 12.8
GPU Memory: GPU 0: 14.56/14.57 GB
Total GPU Memory: Free: 14.56 GB, Allocated: 0.01 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 27.28 GB / 30.95 GB (88.2%)
Disk Space Avail: 203.59 GB / 255.99 GB (79.5%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='extreme' : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Requires a GPU and `pip install autogluon.tabular[tabarena]` to install TabPFN, TabICL, and TabDPT.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
presets='high' : Strong accuracy with fast inference speed.
presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 120s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_225058"
Train Data Rows: 142
Train Data Columns: 13
Label Column: target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
3 unique label values: [np.int64(0), np.int64(2), np.int64(1)]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 27949.44 MB
Train Data (Original) Memory Usage: 0.01 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
0.0s = Fit runtime
13 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
'MITRA': [{'fine_tune': True, 'fine_tune_steps': 10}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ... Training model for up to 119.97s of the 119.97s of remaining time.
Fitting with cpus=4, gpus=1, mem=7.0/27.3 GB
0.9655 = Validation score (accuracy)
8.08s = Training runtime
0.13s = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.97s of the 111.23s of remaining time.
Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/26.6 GB
Ensemble Weights: {'Mitra': 1.0}
0.9655 = Validation score (accuracy)
0.0s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 8.79s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 222.4 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20251219_225058")
Evaluating Fine-tuned Mitra Performance¶
# Show model leaderboard
print("\nMitra Model Leaderboard:")
mitra_predictor_ft.leaderboard(wine_test_data)
Mitra Model Leaderboard:
| model | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mitra | 1.0 | 0.965517 | accuracy | 0.316723 | 0.129628 | 8.080158 | 0.316723 | 0.129628 | 8.080158 | 1 | True | 1 |
| 1 | WeightedEnsemble_L2 | 1.0 | 0.965517 | accuracy | 0.319431 | 0.130421 | 8.083243 | 0.002708 | 0.000793 | 0.003085 | 2 | True | 2 |
Using Mitra for Regression¶
# Create predictor with Mitra for regression
print("Training Mitra regressor on California Housing dataset...")
mitra_reg_predictor = TabularPredictor(
label='target',
path='./mitra_regressor_model',
problem_type='regression'
)
mitra_reg_predictor.fit(
housing_train_data.sample(1000), # sample 1000 rows
hyperparameters={
'MITRA': {'fine_tune': False}
},
)
# Evaluate regression performance
mitra_reg_predictor.leaderboard(housing_test_data)
Training Mitra regressor on California Housing dataset...
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.5.0b20251219
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.9.1+cu128
CUDA Version: 12.8
GPU Memory: GPU 0: 14.55/14.57 GB
Total GPU Memory: Free: 14.55 GB, Allocated: 0.02 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 26.61 GB / 30.95 GB (86.0%)
Disk Space Avail: 203.30 GB / 255.99 GB (79.4%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='extreme' : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Requires a GPU and `pip install autogluon.tabular[tabarena]` to install TabPFN, TabICL, and TabDPT.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
presets='high' : Strong accuracy with fast inference speed.
presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/mitra_regressor_model"
Train Data Rows: 1000
Train Data Columns: 8
Label Column: target
Problem Type: regression
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 27248.23 MB
Train Data (Original) Memory Usage: 0.06 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 8 | ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', ...]
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 8 | ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', ...]
0.0s = Fit runtime
8 features in original data used to generate 8 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
'MITRA': [{'fine_tune': False}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ...
Fitting with cpus=4, gpus=1, mem=7.1/26.6 GB
-0.5142 = Validation score (-root_mean_squared_error)
3.27s = Training runtime
0.6s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/26.5 GB
Ensemble Weights: {'Mitra': 1.0}
-0.5142 = Validation score (-root_mean_squared_error)
0.0s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 4.18s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 333.6 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/mitra_regressor_model")
| model | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mitra | -0.558119 | -0.51416 | root_mean_squared_error | 4.835476 | 0.599180 | 3.268258 | 4.835476 | 0.599180 | 3.268258 | 1 | True | 1 |
| 1 | WeightedEnsemble_L2 | -0.558119 | -0.51416 | root_mean_squared_error | 4.838484 | 0.599504 | 3.270775 | 0.003008 | 0.000324 | 0.002517 | 2 | True | 2 |
2. TabICL: In-Context Learning for Tabular Data¶
TabICL (”Tabular In-Context Learning”) is a foundational model designed specifically for in-context learning on large tabular datasets.
Paper: “TabICL: A Tabular Foundation Model for In-Context Learning on Large Data”
Authors: Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan
GitHub: https://github.com/soda-inria/tabicl
TabICL leverages transformer architecture with in-context learning capabilities, making it particularly effective for scenarios where you have limited training data but access to related examples.
# Train TabICL on dataset
print("Training TabICL on wine dataset...")
tabicl_predictor = TabularPredictor(
label='target',
path='./tabicl_model'
)
tabicl_predictor.fit(
wine_train_data,
hyperparameters={
'TABICL': {},
},
)
# Show prediction probabilities for first few samples
tabicl_predictions = tabicl_predictor.predict_proba(wine_test_data)
print(tabicl_predictions.head())
# Show TabICL leaderboard
print("\nTabICL Model Details:")
tabicl_predictor.leaderboard(wine_test_data)
Training TabICL on wine dataset...
INFO: You are downloading 'tabicl-classifier-v1.1-0506.ckpt', the latest best-performing version of TabICL.
To reproduce results from the original paper, please use 'tabicl-classifier-v1-0208.ckpt'.
Checkpoint 'tabicl-classifier-v1.1-0506.ckpt' not cached.
Downloading from Hugging Face Hub (jingang/TabICL-clf).
0 1 2
10 0.998975 0.000932 0.000093
134 0.001462 0.256886 0.741652
28 0.990519 0.009300 0.000181
121 0.567253 0.423800 0.008948
62 0.009253 0.986019 0.004729
TabICL Model Details:
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.5.0b20251219
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.9.1+cu128
CUDA Version: 12.8
GPU Memory: GPU 0: 14.55/14.57 GB
Total GPU Memory: Free: 14.55 GB, Allocated: 0.02 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 26.49 GB / 30.95 GB (85.6%)
Disk Space Avail: 202.74 GB / 255.99 GB (79.2%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='extreme' : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Requires a GPU and `pip install autogluon.tabular[tabarena]` to install TabPFN, TabICL, and TabDPT.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
presets='high' : Strong accuracy with fast inference speed.
presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/tabicl_model"
Train Data Rows: 142
Train Data Columns: 13
Label Column: target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
3 unique label values: [np.int64(0), np.int64(2), np.int64(1)]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 27128.20 MB
Train Data (Original) Memory Usage: 0.01 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
0.0s = Fit runtime
13 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.04s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
'TABICL': [{}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: TabICL ...
Fitting with cpus=4, gpus=1, mem=1.0/26.5 GB
1.0 = Validation score (accuracy)
1.07s = Training runtime
0.27s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/26.5 GB
Ensemble Weights: {'TabICL': 1.0}
1.0 = Validation score (accuracy)
0.0s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 1.61s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 107.1 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/tabicl_model")
| model | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | TabICL | 0.972222 | 1.0 | accuracy | 0.329288 | 0.269997 | 1.073370 | 0.329288 | 0.269997 | 1.07337 | 1 | True | 1 |
| 1 | WeightedEnsemble_L2 | 0.972222 | 1.0 | accuracy | 0.333708 | 0.270772 | 1.076389 | 0.004420 | 0.000776 | 0.00302 | 2 | True | 2 |
3. TabPFNv2: Prior-Fitted Networks¶
TabPFNv2 (”Tabular Prior-Fitted Networks v2”) is designed for accurate predictions on small tabular datasets by using prior-fitted network architectures.
Paper: “Accurate predictions on small data with a tabular foundation model”
Authors: Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister & Frank Hutter
GitHub: https://github.com/PriorLabs/TabPFN
TabPFNv2 excels on small datasets (< 10,000 samples) by leveraging prior knowledge encoded in the network architecture.
# Train TabPFNv2 on Wine dataset (perfect size for TabPFNv2)
print("Training TabPFNv2 on Wine dataset...")
tabpfnv2_predictor = TabularPredictor(
label='target',
path='./tabpfnv2_model'
)
tabpfnv2_predictor.fit(
wine_train_data,
hyperparameters={
'TABPFNV2': {
# TabPFNv2 works best with default parameters on small datasets
},
},
)
# Show prediction probabilities for first few samples
tabpfnv2_predictions = tabpfnv2_predictor.predict_proba(wine_test_data)
print(tabpfnv2_predictions.head())
tabpfnv2_predictor.leaderboard(wine_test_data)
Training TabPFNv2 on Wine dataset...
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.5.0b20251219
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.9.1+cu128
CUDA Version: 12.8
GPU Memory: GPU 0: 14.55/14.57 GB
Total GPU Memory: Free: 14.55 GB, Allocated: 0.02 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 26.49 GB / 30.95 GB (85.6%)
Disk Space Avail: 202.54 GB / 255.99 GB (79.1%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='extreme' : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Requires a GPU and `pip install autogluon.tabular[tabarena]` to install TabPFN, TabICL, and TabDPT.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='best_v150': New in v1.5: Better quality than 'best' and 5x+ faster to train. Give it a try!
presets='high' : Strong accuracy with fast inference speed.
presets='high_v150': New in v1.5: Better quality than 'high' and 5x+ faster to train. Give it a try!
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/tabpfnv2_model"
Train Data Rows: 142
Train Data Columns: 13
Label Column: target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
3 unique label values: [np.int64(0), np.int64(2), np.int64(1)]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 27130.89 MB
Train Data (Original) Memory Usage: 0.01 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
0.0s = Fit runtime
13 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
'TABPFNV2': [{}],
}
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[11], line 7
2 print("Training TabPFNv2 on Wine dataset...")
3 tabpfnv2_predictor = TabularPredictor(
4 label='target',
5 path='./tabpfnv2_model'
6 )
----> 7 tabpfnv2_predictor.fit(
8 wine_train_data,
9 hyperparameters={
10 'TABPFNV2': {
11 # TabPFNv2 works best with default parameters on small datasets
12 },
13 },
14 )
16 # Show prediction probabilities for first few samples
17 tabpfnv2_predictions = tabpfnv2_predictor.predict_proba(wine_test_data)
File ~/autogluon/common/src/autogluon/common/utils/decorators.py:34, in unpack.<locals>._unpack_inner.<locals>._call(*args, **kwargs)
31 @functools.wraps(f)
32 def _call(*args, **kwargs):
33 gargs, gkwargs = g(*other_args, *args, **kwargs)
---> 34 return f(*gargs, **gkwargs)
File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1412, in TabularPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, feature_metadata, infer_limit, infer_limit_batch_size, fit_weighted_ensemble, fit_full_last_level_weighted_ensemble, full_weighted_ensemble_additionally, dynamic_stacking, calibrate_decision_threshold, num_cpus, num_gpus, fit_strategy, memory_limit, callbacks, **kwargs)
1409 # keep track of the fit strategy used for future calls
1410 self._fit_strategy = fit_strategy
-> 1412 self._fit(ag_fit_kwargs=ag_fit_kwargs, ag_post_fit_kwargs=ag_post_fit_kwargs)
1414 return self
File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1418, in TabularPredictor._fit(self, ag_fit_kwargs, ag_post_fit_kwargs)
1416 def _fit(self, ag_fit_kwargs: dict, ag_post_fit_kwargs: dict):
1417 self.save(silent=True) # Save predictor to disk to enable prediction and training after interrupt
-> 1418 self._learner.fit(**ag_fit_kwargs)
1419 self._set_post_fit_vars()
1420 self._post_fit(**ag_post_fit_kwargs)
File ~/autogluon/tabular/src/autogluon/tabular/learner/abstract_learner.py:159, in AbstractTabularLearner.fit(self, X, X_val, **kwargs)
157 raise AssertionError("Learner is already fit.")
158 self._validate_fit_input(X=X, X_val=X_val, **kwargs)
--> 159 return self._fit(X=X, X_val=X_val, **kwargs)
File ~/autogluon/tabular/src/autogluon/tabular/learner/default_learner.py:133, in DefaultLearner._fit(self, X, X_val, X_test, X_unlabeled, holdout_frac, num_bag_folds, num_bag_sets, time_limit, infer_limit, infer_limit_batch_size, verbosity, raise_on_model_failure, **trainer_fit_kwargs)
130 self.eval_metric = trainer.eval_metric
132 self.save()
--> 133 trainer.fit(
134 X=X,
135 y=y,
136 X_val=X_val,
137 y_val=y_val,
138 X_test=X_test,
139 y_test=y_test,
140 X_unlabeled=X_unlabeled,
141 holdout_frac=holdout_frac,
142 time_limit=time_limit_trainer,
143 infer_limit=infer_limit,
144 infer_limit_batch_size=infer_limit_batch_size,
145 groups=groups,
146 label_cleaner=copy.deepcopy(self.label_cleaner),
147 **trainer_fit_kwargs,
148 )
149 self.save_trainer(trainer=trainer)
150 time_end = time.time()
File ~/autogluon/tabular/src/autogluon/tabular/trainer/auto_trainer.py:140, in AutoTrainer.fit(self, X, y, hyperparameters, X_val, y_val, X_test, y_test, X_unlabeled, holdout_frac, num_stack_levels, core_kwargs, aux_kwargs, time_limit, infer_limit, infer_limit_batch_size, use_bag_holdout, groups, callbacks, label_cleaner, **kwargs)
137 if label_cleaner is not None:
138 core_kwargs["label_cleaner"] = label_cleaner
--> 140 self._train_multi_and_ensemble(
141 X=X,
142 y=y,
143 X_val=X_val,
144 y_val=y_val,
145 X_test=X_test,
146 y_test=y_test,
147 X_unlabeled=X_unlabeled,
148 hyperparameters=hyperparameters,
149 num_stack_levels=num_stack_levels,
150 time_limit=time_limit,
151 core_kwargs=core_kwargs,
152 aux_kwargs=aux_kwargs,
153 infer_limit=infer_limit,
154 infer_limit_batch_size=infer_limit_batch_size,
155 groups=groups,
156 callbacks=callbacks,
157 )
File ~/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py:3345, in AbstractTabularTrainer._train_multi_and_ensemble(self, X, y, X_val, y_val, X_test, y_test, hyperparameters, X_unlabeled, num_stack_levels, time_limit, groups, **kwargs)
3343 self._num_rows_test = len(X_test)
3344 self._num_cols_train = len(list(X.columns))
-> 3345 model_names_fit = self.train_multi_levels(
3346 X,
3347 y,
3348 hyperparameters=hyperparameters,
3349 X_val=X_val,
3350 y_val=y_val,
3351 X_test=X_test,
3352 y_test=y_test,
3353 X_unlabeled=X_unlabeled,
3354 level_start=1,
3355 level_end=num_stack_levels + 1,
3356 time_limit=time_limit,
3357 **kwargs,
3358 )
3359 if len(self.get_model_names()) == 0:
3360 # TODO v1.0: Add toggle to raise exception if no models trained
3361 logger.log(30, "Warning: AutoGluon did not successfully train any models")
File ~/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py:506, in AbstractTabularTrainer.train_multi_levels(self, X, y, hyperparameters, X_val, y_val, X_test, y_test, X_unlabeled, base_model_names, core_kwargs, aux_kwargs, level_start, level_end, time_limit, name_suffix, relative_stack, level_time_modifier, infer_limit, infer_limit_batch_size, callbacks)
504 core_kwargs_level["time_limit"] = core_kwargs_level.get("time_limit", time_limit_core)
505 aux_kwargs_level["time_limit"] = aux_kwargs_level.get("time_limit", time_limit_aux)
--> 506 base_model_names, aux_models = self.stack_new_level(
507 X=X,
508 y=y,
509 X_val=X_val,
510 y_val=y_val,
511 X_test=X_test,
512 y_test=y_test,
513 X_unlabeled=X_unlabeled,
514 models=hyperparameters,
515 level=level,
516 base_model_names=base_model_names,
517 core_kwargs=core_kwargs_level,
518 aux_kwargs=aux_kwargs_level,
519 name_suffix=name_suffix,
520 infer_limit=infer_limit,
521 infer_limit_batch_size=infer_limit_batch_size,
522 full_weighted_ensemble=full_weighted_ensemble,
523 additional_full_weighted_ensemble=additional_full_weighted_ensemble,
524 )
525 model_names_fit += base_model_names + aux_models
526 if (self.model_best is None or infer_limit is not None) and len(model_names_fit) != 0:
File ~/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py:736, in AbstractTabularTrainer.stack_new_level(self, X, y, models, X_val, y_val, X_test, y_test, X_unlabeled, level, base_model_names, core_kwargs, aux_kwargs, name_suffix, infer_limit, infer_limit_batch_size, full_weighted_ensemble, additional_full_weighted_ensemble)
734 core_kwargs["name_suffix"] = core_kwargs.get("name_suffix", "") + name_suffix
735 aux_kwargs["name_suffix"] = aux_kwargs.get("name_suffix", "") + name_suffix
--> 736 core_models = self.stack_new_level_core(
737 X=X,
738 y=y,
739 X_val=X_val,
740 y_val=y_val,
741 X_test=X_test,
742 y_test=y_test,
743 X_unlabeled=X_unlabeled,
744 models=models,
745 level=level,
746 infer_limit=infer_limit,
747 infer_limit_batch_size=infer_limit_batch_size,
748 base_model_names=base_model_names,
749 **core_kwargs,
750 )
752 aux_models = []
753 if full_weighted_ensemble:
File ~/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py:855, in AbstractTabularTrainer.stack_new_level_core(self, X, y, models, X_val, y_val, X_test, y_test, X_unlabeled, level, base_model_names, fit_strategy, stack_name, ag_args, ag_args_fit, ag_args_ensemble, included_model_types, excluded_model_types, ensemble_type, name_suffix, get_models_func, refit_full, infer_limit, infer_limit_batch_size, **kwargs)
840 ensemble_kwargs = {
841 "base_model_names": base_model_names,
842 "base_model_paths_dict": base_model_paths,
(...)
846 "random_state": level + self.random_state,
847 }
848 get_models_kwargs.update(
849 dict(
850 ag_args_ensemble=ag_args_ensemble,
(...)
853 )
854 )
--> 855 models, model_args_fit = get_models_func(hyperparameters=models, **get_models_kwargs)
856 if model_args_fit:
857 hyperparameter_tune_kwargs = {
858 model_name: model_args_fit[model_name]["hyperparameter_tune_kwargs"]
859 for model_name in model_args_fit
860 if "hyperparameter_tune_kwargs" in model_args_fit[model_name]
861 }
File ~/autogluon/tabular/src/autogluon/tabular/trainer/auto_trainer.py:31, in AutoTrainer.construct_model_templates(self, hyperparameters, **kwargs)
28 ag_args_fit = ag_args_fit.copy()
29 ag_args_fit["quantile_levels"] = quantile_levels
---> 31 return get_preset_models(
32 path=path,
33 problem_type=problem_type,
34 eval_metric=eval_metric,
35 hyperparameters=hyperparameters,
36 ag_args_fit=ag_args_fit,
37 invalid_model_names=invalid_model_names,
38 silent=silent,
39 **kwargs,
40 )
File ~/autogluon/tabular/src/autogluon/tabular/trainer/model_presets/presets.py:122, in get_preset_models(path, problem_type, eval_metric, hyperparameters, level, ensemble_type, ensemble_kwargs, ag_args_fit, ag_args, ag_args_ensemble, name_suffix, default_priorities, invalid_model_names, included_model_types, excluded_model_types, hyperparameter_preprocess_func, hyperparameter_preprocess_kwargs, silent)
120 model_cfgs_to_process.append(model_cfg)
121 for model_cfg in model_cfgs_to_process:
--> 122 model_cfg = clean_model_cfg(
123 model_cfg=model_cfg,
124 model_type=model_type,
125 ag_args=ag_args,
126 ag_args_ensemble=ag_args_ensemble,
127 ag_args_fit=ag_args_fit,
128 problem_type=problem_type,
129 )
130 model_cfg[AG_ARGS]["priority"] = model_cfg[AG_ARGS].get("priority", default_priorities.get(model_type, DEFAULT_CUSTOM_MODEL_PRIORITY))
131 model_priority = model_cfg[AG_ARGS]["priority"]
File ~/autogluon/tabular/src/autogluon/tabular/trainer/model_presets/presets.py:181, in clean_model_cfg(model_cfg, model_type, ag_args, ag_args_ensemble, ag_args_fit, problem_type)
179 if not inspect.isclass(model_type):
180 if model_type not in model_types:
--> 181 raise AssertionError(f"Unknown model type specified in hyperparameters: '{model_type}'. Valid model types: {list(model_types.keys())}")
182 model_type = model_types[model_type]
183 elif not issubclass(model_type, AbstractModel):
AssertionError: Unknown model type specified in hyperparameters: 'TABPFNV2'. Valid model types: ['RF', 'XT', 'KNN', 'GBM', 'CAT', 'XGB', 'REALMLP', 'NN_TORCH', 'LR', 'FASTAI', 'GBM_PREP', 'AG_TEXT_NN', 'AG_IMAGE_NN', 'AG_AUTOMM', 'FT_TRANSFORMER', 'TABDPT', 'TABICL', 'TABM', 'TABPFNMIX', 'REALTABPFN-V2', 'REALTABPFN-V2.5', 'MITRA', 'FASTTEXT', 'ENS_WEIGHTED', 'SIMPLE_ENS_WEIGHTED', 'IM_RULEFIT', 'IM_GREEDYTREE', 'IM_FIGS', 'IM_HSTREE', 'IM_BOOSTEDRULES', 'DUMMY', 'EBM']
Advanced Usage: Combining Multiple Foundational Models¶
AutoGluon allows you to combine multiple foundational models in a single predictor for enhanced performance through model stacking and ensembling:
# Configure multiple foundational models together
multi_foundation_config = {
'MITRA': {
'fine_tune': True,
'fine_tune_steps': 10
},
'TABPFNV2': {},
'TABICL': {},
}
print("Training ensemble of foundational models...")
ensemble_predictor = TabularPredictor(
label='target',
path='./ensemble_foundation_model'
).fit(
wine_train_data,
hyperparameters=multi_foundation_config,
time_limit=300, # More time for multiple models
)
# Evaluate ensemble performance
ensemble_predictor.leaderboard(wine_test_data)