AutoGluon Tabular - Foundational Models

Open In Colab Open In SageMaker Studio Lab

In this tutorial, we introduce support for cutting-edge foundational tabular models that leverage pre-training and in-context learning to achieve state-of-the-art performance on tabular datasets. These models represent a significant advancement in automated machine learning for structured data.

In this tutorial, we’ll explore four foundational tabular models:

  1. Mitra - AutoGluon’s new state-of-the-art tabular foundation model

  2. TabICL - In-context learning for large tabular datasets

  3. TabPFNv2 - Prior-fitted networks for accurate predictions on small data

These models excel particularly on small to medium-sized datasets and can run in both zero-shot and fine-tuning modes.

Installation

First, let’s install AutoGluon with support for foundational models:

# Individual model installations:
!pip install uv
!uv pip install autogluon.tabular[mitra]   # For Mitra
!uv pip install autogluon.tabular[tabicl]   # For TabICL
!uv pip install autogluon.tabular[tabpfn]   # For TabPFNv2

Hide code cell output

Collecting uv
  Downloading uv-0.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.7 MB)
?25l
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/18.7 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.7/18.7 MB 235.6 MB/s eta 0:00:00
?25h
Installing collected packages: uv
Successfully installed uv-0.8.0
Using Python 3.12.10 environment at: /home/ci/opt/venv
Audited 1 package in 71ms
Using Python 3.12.10 environment at: /home/ci/opt/venv
Audited 1 package in 12ms
Using Python 3.12.10 environment at: /home/ci/opt/venv
Audited 1 package in 10ms
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine, fetch_california_housing

Example Data

For this tutorial, we’ll demonstrate the foundational models on three different datasets to showcase their versatility:

  1. Wine Dataset (Multi-class Classification) - Medium-sized dataset for comparing model performance

  2. California Housing (Regression) - Regression dataset

Let’s load and prepare these datasets:

# Load datasets

# 1. Wine (Multi-class Classification)
wine_data = load_wine()
wine_df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)
wine_df['target'] = wine_data.target

# 2. California Housing (Regression)
housing_data = fetch_california_housing()
housing_df = pd.DataFrame(housing_data.data, columns=housing_data.feature_names)
housing_df['target'] = housing_data.target

print("Dataset shapes:")
print(f"Wine: {wine_df.shape}")
print(f"California Housing: {housing_df.shape}")
Dataset shapes:
Wine: (178, 14)
California Housing: (20640, 9)

Create Train/Test Splits

Let’s create train/test splits for our datasets:

# Create train/test splits (80/20)
wine_train, wine_test = train_test_split(wine_df, test_size=0.2, random_state=42, stratify=wine_df['target'])
housing_train, housing_test = train_test_split(housing_df, test_size=0.2, random_state=42)

print("Training set sizes:")
print(f"Wine: {len(wine_train)} samples")
print(f"Housing: {len(housing_train)} samples")

# Convert to TabularDataset
wine_train_data = TabularDataset(wine_train)
wine_test_data = TabularDataset(wine_test)
housing_train_data = TabularDataset(housing_train)
housing_test_data = TabularDataset(housing_test)
Training set sizes:
Wine: 142 samples
Housing: 16512 samples

1. Mitra: AutoGluon’s Tabular Foundation Model

Mitra is a new state-of-the-art tabular foundation model developed by the AutoGluon team, natively supported in AutoGluon with just three lines of code via predictor.fit()). Built on the in-context learning paradigm and pretrained exclusively on synthetic data, Mitra introduces a principled pretraining approach by carefully selecting and mixing diverse synthetic priors to promote robust generalization across a wide range of real-world tabular datasets.

📊 Mitra achieves state-of-the-art performance on major benchmarks including TabRepo, TabZilla, AMLB, and TabArena, especially excelling on small tabular datasets with fewer than 5,000 samples and 100 features, for both classification and regression tasks.

🧠 Mitra supports both zero-shot and fine-tuning modes and runs seamlessly on both GPU and CPU. Its weights are fully open-sourced under the Apache-2.0 license, making it a privacy-conscious and production-ready solution for enterprises concerned about data sharing and hosting.

🔗 Learn more on Hugging Face:

Using Mitra for Classification

# Create predictor with Mitra
print("Training Mitra classifier on classification dataset...")
mitra_predictor = TabularPredictor(label='target')
mitra_predictor.fit(
    wine_train_data,
    hyperparameters={
        'MITRA': {'fine_tune': False}
    },
   )

print("\nMitra training completed!")
Training Mitra classifier on classification dataset...

Mitra training completed!
No path specified. Models will be saved in: "AutogluonModels/ag-20250718_212934"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.59 GB / 30.95 GB (92.4%)
Disk Space Avail:   206.61 GB / 255.99 GB (80.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
/home/ci/autogluon/common/src/autogluon/common/utils/utils.py:97: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250718_212934"
Train Data Rows:    142
Train Data Columns: 13
Label Column:       target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	3 unique label values:  [np.int64(0), np.int64(2), np.int64(1)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29274.65 MB
	Train Data (Original)  Memory Usage: 0.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	0.0s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.04s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
	'MITRA': [{'fine_tune': False}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ...
Using CUDA GPU
2025-07-18 21:29:39.916 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_start_metrics:334 - Epoch 000 | Train CE: -.---- | Train acc: -.---- | Val CE: 0.0884 | Val acc: 0.9655
	0.9655	 = Validation score   (accuracy)
	5.21s	 = Training   runtime
	0.14s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'Mitra': 1.0}
	0.9655	 = Validation score   (accuracy)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 5.75s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 208.4 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250718_212934")

Evaluate Mitra Performance

# Make predictions
mitra_predictions = mitra_predictor.predict(wine_test_data)
print("Sample Mitra predictions:")
print(mitra_predictions.head(10))

# Show prediction probabilities for first few samples
mitra_predictions = mitra_predictor.predict_proba(wine_test_data)
print(mitra_predictions.head())

# Show model leaderboard
print("\nMitra Model Leaderboard:")
mitra_predictor.leaderboard(wine_test_data, silent=True)
Sample Mitra predictions:
10     0
134    2
28     0
121    0
62     1
51     0
7      0
66     0
129    1
166    2
Name: target, dtype: int64
            0         1         2
10   0.995547  0.004069  0.000384
134  0.001260  0.092571  0.906169
28   0.968737  0.031140  0.000123
121  0.495463  0.495463  0.009075
62   0.132688  0.865233  0.002079

Mitra Model Leaderboard:
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 Mitra 0.944444 0.965517 accuracy 0.344352 0.13834 5.212310 0.344352 0.138340 5.212310 1 True 1
1 WeightedEnsemble_L2 0.944444 0.965517 accuracy 0.347273 0.13914 5.215623 0.002921 0.000799 0.003313 2 True 2

Finetuning with Mitra

mitra_predictor_ft = TabularPredictor(label='target')
mitra_predictor_ft.fit(
    wine_train_data,
    hyperparameters={
        'MITRA': {'fine_tune': True, 'fine_tune_steps': 10}
    },
    time_limit=120,  # 2 minutes
   )

print("\nMitra fine-tuning completed!")
Mitra fine-tuning completed!
No path specified. Models will be saved in: "AutogluonModels/ag-20250718_212941"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.66 GB / 30.95 GB (89.4%)
Disk Space Avail:   206.04 GB / 255.99 GB (80.5%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 120s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250718_212941"
Train Data Rows:    142
Train Data Columns: 13
Label Column:       target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	3 unique label values:  [np.int64(0), np.int64(2), np.int64(1)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28321.89 MB
	Train Data (Original)  Memory Usage: 0.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	0.0s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
	'MITRA': [{'fine_tune': True, 'fine_tune_steps': 10}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ... Training model for up to 119.97s of the 119.96s of remaining time.
Using CUDA GPU
2025-07-18 21:29:42.935 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_start_metrics:334 - Epoch 000 | Train CE: -.---- | Train acc: -.---- | Val CE: 0.0725 | Val acc: 1.0000
2025-07-18 21:29:43.791 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 001 | Train CE: 0.1051 | Train acc: 0.9565 | Val CE: 0.0714 | Val acc: 1.0000
2025-07-18 21:29:44.478 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 002 | Train CE: 0.0499 | Train acc: 1.0000 | Val CE: 0.0707 | Val acc: 1.0000
2025-07-18 21:29:45.169 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 003 | Train CE: 0.0781 | Train acc: 1.0000 | Val CE: 0.0692 | Val acc: 1.0000
2025-07-18 21:29:45.857 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 004 | Train CE: 0.1511 | Train acc: 0.9565 | Val CE: 0.0672 | Val acc: 1.0000
2025-07-18 21:29:46.550 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 005 | Train CE: 0.0603 | Train acc: 1.0000 | Val CE: 0.0657 | Val acc: 1.0000
2025-07-18 21:29:47.242 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 006 | Train CE: 0.0720 | Train acc: 0.9565 | Val CE: 0.0645 | Val acc: 1.0000
2025-07-18 21:29:47.932 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 007 | Train CE: 0.1425 | Train acc: 0.9565 | Val CE: 0.0630 | Val acc: 1.0000
2025-07-18 21:29:48.624 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 008 | Train CE: 0.0966 | Train acc: 0.9565 | Val CE: 0.0614 | Val acc: 1.0000
2025-07-18 21:29:49.314 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 009 | Train CE: 0.0364 | Train acc: 1.0000 | Val CE: 0.0595 | Val acc: 1.0000
2025-07-18 21:29:50.003 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 010 | Train CE: 0.1724 | Train acc: 0.9565 | Val CE: 0.0595 | Val acc: 1.0000
	1.0	 = Validation score   (accuracy)
	8.42s	 = Training   runtime
	0.14s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.97s of the 111.08s of remaining time.
	Ensemble Weights: {'Mitra': 1.0}
	1.0	 = Validation score   (accuracy)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 8.94s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 207.5 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250718_212941")

Evaluating Fine-tuned Mitra Performance

# Show model leaderboard
print("\nMitra Model Leaderboard:")
mitra_predictor_ft.leaderboard(wine_test_data, silent=True)
Mitra Model Leaderboard:
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 Mitra 1.0 1.0 accuracy 0.345114 0.139007 8.424691 0.345114 0.139007 8.424691 1 True 1
1 WeightedEnsemble_L2 1.0 1.0 accuracy 0.348341 0.139748 8.427467 0.003227 0.000741 0.002776 2 True 2

Using Mitra for Regression

# Create predictor with Mitra for regression
print("Training Mitra regressor on California Housing dataset...")
mitra_reg_predictor = TabularPredictor(
    label='target',
    path='./mitra_regressor_model',
    problem_type='regression'
)
mitra_reg_predictor.fit(
    housing_train_data.sample(1000), # sample 1000 rows
    hyperparameters={
        'MITRA': {'fine_tune': False}
    },
)

# Evaluate regression performance
mitra_reg_predictor.leaderboard(housing_test_data)
Training Mitra regressor on California Housing dataset...
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.45 GB / 30.95 GB (88.7%)
Disk Space Avail:   205.76 GB / 255.99 GB (80.4%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/mitra_regressor_model"
Train Data Rows:    1000
Train Data Columns: 8
Label Column:       target
Problem Type:       regression
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28109.04 MB
	Train Data (Original)  Memory Usage: 0.06 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 8 | ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 8 | ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', ...]
	0.0s = Fit runtime
	8 features in original data used to generate 8 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'MITRA': [{'fine_tune': False}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: Mitra ...
Using CUDA GPU
2025-07-18 21:29:54.631 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_start_metrics:323 - Epoch 000 | Train MSE: -.---- | Train MAE: -.---- | Train r2: -.---- | Val MSE: 0.3169 | Val MAE: 0.3845 | Val r2: 0.7613
	-0.5632	 = Validation score   (-root_mean_squared_error)
	3.73s	 = Training   runtime
	0.64s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'Mitra': 1.0}
	-0.5632	 = Validation score   (-root_mean_squared_error)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 4.73s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 314.2 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/mitra_regressor_model")
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 Mitra -0.559287 -0.563234 root_mean_squared_error 5.109980 0.636237 3.732873 5.109980 0.636237 3.732873 1 True 1
1 WeightedEnsemble_L2 -0.559287 -0.563234 root_mean_squared_error 5.112844 0.636572 3.735013 0.002865 0.000335 0.002140 2 True 2

2. TabICL: In-Context Learning for Tabular Data

TabICL (”Tabular In-Context Learning”) is a foundational model designed specifically for in-context learning on large tabular datasets.

Paper: “TabICL: A Tabular Foundation Model for In-Context Learning on Large Data”
Authors: Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan
GitHub: https://github.com/soda-inria/tabicl

TabICL leverages transformer architecture with in-context learning capabilities, making it particularly effective for scenarios where you have limited training data but access to related examples.

# Train TabICL on dataset
print("Training TabICL on wine dataset...")
tabicl_predictor = TabularPredictor(
    label='target',
    path='./tabicl_model'
)
tabicl_predictor.fit(
    wine_train_data,
    hyperparameters={
        'TABICL': {},
    },
)

# Show prediction probabilities for first few samples
tabicl_predictions = tabicl_predictor.predict_proba(wine_test_data)
print(tabicl_predictions.head())

# Show TabICL leaderboard
print("\nTabICL Model Details:")
tabicl_predictor.leaderboard(wine_test_data, silent=True)
Training TabICL on wine dataset...
INFO: You are downloading 'tabicl-classifier-v1.1-0506.ckpt', the latest best-performing version of TabICL.
To reproduce results from the original paper, please use 'tabicl-classifier-v1-0208.ckpt'.

Checkpoint 'tabicl-classifier-v1.1-0506.ckpt' not cached.
 Downloading from Hugging Face Hub (jingang/TabICL-clf).
            0         1         2
10   0.999185  0.000721  0.000094
134  0.002010  0.255196  0.742794
28   0.992078  0.007737  0.000185
121  0.585142  0.405107  0.009750
62   0.009691  0.985739  0.004570

TabICL Model Details:
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.40 GB / 30.95 GB (88.5%)
Disk Space Avail:   205.19 GB / 255.99 GB (80.2%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/tabicl_model"
Train Data Rows:    142
Train Data Columns: 13
Label Column:       target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	3 unique label values:  [np.int64(0), np.int64(2), np.int64(1)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28060.25 MB
	Train Data (Original)  Memory Usage: 0.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	0.0s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
	'TABICL': [{}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: TabICL ...
	1.0	 = Validation score   (accuracy)
	2.9s	 = Training   runtime
	0.37s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'TabICL': 1.0}
	1.0	 = Validation score   (accuracy)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 3.48s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 77.4 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/tabicl_model")
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 TabICL 0.972222 1.0 accuracy 0.383188 0.374056 2.904715 0.383188 0.374056 2.904715 1 True 1
1 WeightedEnsemble_L2 0.972222 1.0 accuracy 0.385822 0.374793 2.907484 0.002634 0.000737 0.002769 2 True 2

3. TabPFNv2: Prior-Fitted Networks

TabPFNv2 (”Tabular Prior-Fitted Networks v2”) is designed for accurate predictions on small tabular datasets by using prior-fitted network architectures.

Paper: “Accurate predictions on small data with a tabular foundation model”
Authors: Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister & Frank Hutter
GitHub: https://github.com/PriorLabs/TabPFN

TabPFNv2 excels on small datasets (< 10,000 samples) by leveraging prior knowledge encoded in the network architecture.

# Train TabPFNv2 on Wine dataset (perfect size for TabPFNv2)
print("Training TabPFNv2 on Wine dataset...")
tabpfnv2_predictor = TabularPredictor(
    label='target',
    path='./tabpfnv2_model'
)
tabpfnv2_predictor.fit(
    wine_train_data,
    hyperparameters={
        'TABPFNV2': {
            # TabPFNv2 works best with default parameters on small datasets
        },
    },
)

# Show prediction probabilities for first few samples
tabpfnv2_predictions = tabpfnv2_predictor.predict_proba(wine_test_data)
print(tabpfnv2_predictions.head())


tabpfnv2_predictor.leaderboard(wine_test_data)
Training TabPFNv2 on Wine dataset...
            0         1         2
10   0.999924  0.000071  0.000005
134  0.000063  0.022457  0.977479
28   0.999046  0.000951  0.000003
121  0.133638  0.831147  0.035215
62   0.007277  0.992466  0.000258
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.41 GB / 30.95 GB (88.6%)
Disk Space Avail:   204.99 GB / 255.99 GB (80.1%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/tabpfnv2_model"
Train Data Rows:    142
Train Data Columns: 13
Label Column:       target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	3 unique label values:  [np.int64(0), np.int64(2), np.int64(1)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28064.13 MB
	Train Data (Original)  Memory Usage: 0.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	0.0s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
	'TABPFNV2': [{}],
}
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: TabPFNv2 ...
	Built with PriorLabs-TabPFN
	1.0	 = Validation score   (accuracy)
	0.55s	 = Training   runtime
	0.24s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'TabPFNv2': 1.0}
	1.0	 = Validation score   (accuracy)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 0.88s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 121.9 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/tabpfnv2_model")
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 TabPFNv2 0.972222 1.0 accuracy 0.247503 0.237173 0.546717 0.247503 0.237173 0.546717 1 True 1
1 WeightedEnsemble_L2 0.972222 1.0 accuracy 0.249337 0.237991 0.549950 0.001834 0.000818 0.003233 2 True 2

Advanced Usage: Combining Multiple Foundational Models

AutoGluon allows you to combine multiple foundational models in a single predictor for enhanced performance through model stacking and ensembling:

# Configure multiple foundational models together
multi_foundation_config = {
    'MITRA': {
        'fine_tune': True,
        'fine_tune_steps': 10
    },
    'TABPFNV2': {},
    'TABICL': {},
}

print("Training ensemble of foundational models...")
ensemble_predictor = TabularPredictor(
    label='target',
    path='./ensemble_foundation_model'
).fit(
    wine_train_data,
    hyperparameters=multi_foundation_config,
    time_limit=300,  # More time for multiple models
)

# Evaluate ensemble performance
ensemble_predictor.leaderboard(wine_test_data)
Training ensemble of foundational models...
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.2b20250718
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.38 GB / 30.95 GB (88.5%)
Disk Space Avail:   204.94 GB / 255.99 GB (80.1%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 300s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/ensemble_foundation_model"
Train Data Rows:    142
Train Data Columns: 13
Label Column:       target
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	3 unique label values:  [np.int64(0), np.int64(2), np.int64(1)]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28034.31 MB
	Train Data (Original)  Memory Usage: 0.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', []) : 13 | ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', ...]
	0.0s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.03s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 113, Val Rows: 29
User-specified model hyperparameters to be fit:
{
	'MITRA': [{'fine_tune': True, 'fine_tune_steps': 10}],
	'TABPFNV2': [{}],
	'TABICL': [{}],
}
Fitting 3 L1 models, fit_strategy="sequential" ...
Fitting model: TabPFNv2 ... Training model for up to 299.97s of the 299.97s of remaining time.
	1.0	 = Validation score   (accuracy)
	0.12s	 = Training   runtime
	0.2s	 = Validation runtime
Fitting model: TabICL ... Training model for up to 299.61s of the 299.61s of remaining time.
	1.0	 = Validation score   (accuracy)
	0.45s	 = Training   runtime
	0.24s	 = Validation runtime
Fitting model: Mitra ... Training model for up to 298.72s of the 298.72s of remaining time.
Using CUDA GPU
2025-07-18 21:30:09.145 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_start_metrics:334 - Epoch 000 | Train CE: -.---- | Train acc: -.---- | Val CE: 0.0785 | Val acc: 1.0000
2025-07-18 21:30:09.876 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 001 | Train CE: 0.1475 | Train acc: 1.0000 | Val CE: 0.0777 | Val acc: 1.0000
2025-07-18 21:30:10.565 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 002 | Train CE: 0.0997 | Train acc: 0.9565 | Val CE: 0.0760 | Val acc: 1.0000
2025-07-18 21:30:11.268 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 003 | Train CE: 0.1280 | Train acc: 0.9565 | Val CE: 0.0742 | Val acc: 1.0000
2025-07-18 21:30:11.960 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 004 | Train CE: 0.0815 | Train acc: 1.0000 | Val CE: 0.0729 | Val acc: 1.0000
2025-07-18 21:30:12.649 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 005 | Train CE: 0.0544 | Train acc: 1.0000 | Val CE: 0.0703 | Val acc: 1.0000
2025-07-18 21:30:13.335 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 006 | Train CE: 0.0889 | Train acc: 1.0000 | Val CE: 0.0678 | Val acc: 1.0000
2025-07-18 21:30:14.023 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 007 | Train CE: 0.0872 | Train acc: 1.0000 | Val CE: 0.0665 | Val acc: 1.0000
2025-07-18 21:30:14.710 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 008 | Train CE: 0.0681 | Train acc: 1.0000 | Val CE: 0.0645 | Val acc: 1.0000
2025-07-18 21:30:15.399 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 009 | Train CE: 0.0470 | Train acc: 1.0000 | Val CE: 0.0619 | Val acc: 1.0000
2025-07-18 21:30:16.103 | INFO     | autogluon.tabular.models.mitra._internal.core.trainer_finetune:log_metrics:355 - Epoch 010 | Train CE: 0.0549 | Train acc: 1.0000 | Val CE: 0.0590 | Val acc: 1.0000
	1.0	 = Validation score   (accuracy)
	8.37s	 = Training   runtime
	0.14s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 299.97s of the 289.88s of remaining time.
	Ensemble Weights: {'TabPFNv2': 1.0}
	1.0	 = Validation score   (accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 10.17s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 146.3 rows/s (29 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/ensemble_foundation_model")
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 Mitra 1.000000 1.0 accuracy 0.384026 0.139155 8.371802 0.384026 0.139155 8.371802 1 True 3
1 TabPFNv2 0.972222 1.0 accuracy 0.240204 0.197590 0.121074 0.240204 0.197590 0.121074 1 True 1
2 WeightedEnsemble_L2 0.972222 1.0 accuracy 0.243297 0.198277 0.154407 0.003092 0.000687 0.033333 2 True 4
3 TabICL 0.972222 1.0 accuracy 0.406164 0.244113 0.454231 0.406164 0.244113 0.454231 1 True 2