AutoMM for Text + Tabular - Quick Start¶

In many applications, text data may be mixed with numeric/categorical data. AutoGluon’s MultiModalPredictor can train a single neural network that jointly operates on multiple feature types, including text, categorical, and numerical columns. The general idea is to embed the text, categorical and numeric fields separately and fuse these features across modalities. This tutorial demonstrates such an application.

import numpy as np
import pandas as pd
import warnings
import os

warnings.filterwarnings('ignore')
np.random.seed(123)

!python3 -m pip install openpyxl

Collecting openpyxl
Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
Downloading et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-2.0.0 openpyxl-3.1.5

Book Price Prediction Data¶

For demonstration, we use the book price prediction dataset from the MachineHack Book Price Prediction Hackathon. Our goal is to predict a book’s price given various features like its author, the abstract, the book’s rating, etc.

!mkdir -p price_of_books
!wget https://automl-mm-bench.s3.amazonaws.com/machine_hack_competitions/predict_the_price_of_books/Data.zip -O price_of_books/Data.zip
!cd price_of_books && unzip -o Data.zip
!ls price_of_books/Participants_Data

--2025-01-07 02:43:20--  https://automl-mm-bench.s3.amazonaws.com/machine_hack_competitions/predict_the_price_of_books/Data.zip
Resolving automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)... 52.217.120.233, 52.217.134.201, 16.182.33.153, ...
Connecting to automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)|52.217.120.233|:443... connected.
HTTP request sent, awaiting response...
200 OK
Length: 3521673 (3.4M) [application/zip]
Saving to: ‘price_of_books/Data.zip’

price_of_books/Data   0%[                    ]       0  --.-KB/s
price_of_books/Data 100%[===================>]   3.36M  --.-KB/s    in 0.03s   

2025-01-07 02:43:20 (124 MB/s) - ‘price_of_books/Data.zip’ saved [3521673/3521673]
Archive:  Data.zip
  inflating: Participants_Data/Data_Test.xlsx  
  inflating: Participants_Data/Data_Train.xlsx  
  inflating: Participants_Data/Sample_Submission.xlsx
Data_Test.xlsx	Data_Train.xlsx  Sample_Submission.xlsx

train_df = pd.read_excel(os.path.join('price_of_books', 'Participants_Data', 'Data_Train.xlsx'), engine='openpyxl')
train_df.head()

	Title	Author	Edition	Reviews	Ratings	Synopsis	Genre	BookCategory	Price
0	The Prisoner's Gold (The Hunters 3)	Chris Kuzneski	Paperback,– 10 Mar 2016	4.0 out of 5 stars	8 customer reviews	THE HUNTERS return in their third brilliant no...	Action & Adventure (Books)	Action & Adventure	220.00
1	Guru Dutt: A Tragedy in Three Acts	Arun Khopkar	Paperback,– 7 Nov 2012	3.9 out of 5 stars	14 customer reviews	A layered portrait of a troubled genius for wh...	Cinema & Broadcast (Books)	Biographies, Diaries & True Accounts	202.93
2	Leviathan (Penguin Classics)	Thomas Hobbes	Paperback,– 25 Feb 1982	4.8 out of 5 stars	6 customer reviews	"During the time men live without a common Pow...	International Relations	Humour	299.00
3	A Pocket Full of Rye (Miss Marple)	Agatha Christie	Paperback,– 5 Oct 2017	4.1 out of 5 stars	13 customer reviews	A handful of grain is found in the pocket of a...	Contemporary Fiction (Books)	Crime, Thriller & Mystery	180.00
4	LIFE 70 Years of Extraordinary Photography	Editors of Life	Hardcover,– 10 Oct 2006	5.0 out of 5 stars	1 customer review	For seven decades, "Life" has been thrilling t...	Photography Textbooks	Arts, Film & Photography	965.62

We do some basic preprocessing to convert Reviews and Ratings in the data table to numeric values, and we transform prices to a log-scale.

def preprocess(df):
    df = df.copy(deep=True)
    df.loc[:, 'Reviews'] = pd.to_numeric(df['Reviews'].apply(lambda ele: ele[:-len(' out of 5 stars')]))
    df.loc[:, 'Ratings'] = pd.to_numeric(df['Ratings'].apply(lambda ele: ele.replace(',', '')[:-len(' customer reviews')]))
    df.loc[:, 'Price'] = np.log(df['Price'] + 1)
    return df

train_subsample_size = 1500  # subsample for faster demo, you can try setting to larger values
test_subsample_size = 5
train_df = preprocess(train_df)
train_data = train_df.iloc[100:].sample(train_subsample_size, random_state=123)
test_data = train_df.iloc[:100].sample(test_subsample_size, random_state=245)
train_data.head()

	Title	Author	Edition	Reviews	Ratings	Synopsis	Genre	BookCategory	Price
949	Furious Hours	Casey Cep	Paperback,– 1 Jun 2019	4.0	NaN	‘It’s been a long time since I picked up a boo...	True Accounts (Books)	Biographies, Diaries & True Accounts	5.743003
5504	REST API Design Rulebook	Mark Masse	Paperback,– 7 Nov 2011	5.0	NaN	In todays market, where rival web services com...	Computing, Internet & Digital Media (Books)	Computing, Internet & Digital Media	5.786897
5856	The Atlantropa Articles: A Novel	Cody Franklin	Paperback,– Import, 1 Nov 2018	4.5	2.0	#1 Amazon Best Seller! Dystopian Alternate His...	Action & Adventure (Books)	Romance	6.893656
4137	Hickory Dickory Dock (Poirot)	Agatha Christie	Paperback,– 5 Oct 2017	4.3	21.0	There’s more than petty theft going on in a Lo...	Action & Adventure (Books)	Crime, Thriller & Mystery	5.192957
3205	The Stanley Kubrick Archives (Bibliotheca Univ...	Alison Castle	Hardcover,– 21 Aug 2016	4.6	3.0	In 1968, when Stanley Kubrick was asked to com...	Cinema & Broadcast (Books)	Humour	6.889591

Training¶

We can simply create a MultiModalPredictor and call predictor.fit() to train a model that operates on across all types of features. Internally, the neural network will be automatically generated based on the inferred data type of each feature column. To save time, we subsample the data and only train for three minutes.

from autogluon.multimodal import MultiModalPredictor
import uuid

time_limit = 3 * 60  # set to larger value in your applications
model_path = f"./tmp/{uuid.uuid4().hex}-automm_text_book_price_prediction"
predictor = MultiModalPredictor(label='Price', path=model_path)
predictor.fit(train_data, time_limit=time_limit)

/home/ci/opt/venv/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
  from torch.distributed.optim import \
=================== System Info ===================
AutoGluon Version:  1.2b20250107
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Tue Sep 24 10:00:37 UTC 2024
CPU Count:          8
Pytorch Version:    2.5.1+cu124
CUDA Version:       12.4
Memory Avail:       28.48 GB / 30.95 GB (92.0%)
Disk Space Avail:   180.26 GB / 255.99 GB (70.4%)
===================================================
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == float and many unique label-values observed).
Label info (max, min, mean, stddev): (9.115699967822062, 3.6109179126442243, 6.02567, 0.7694)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
GPU 0 Name: Tesla T4
GPU 0 Memory: 0.43GB/15.0GB (Used/Total)
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionMLP | 110 M  | train
1 | validation_metric | MeanSquaredError    | 0      | train
2 | loss_func         | MSELoss             | 0      | train
------------------------------------------------------------------
110 M     Trainable params
0         Non-trainable params
110 M     Total params
442.755   Total estimated model params size (MB)
84        Modules in train mode
225       Modules in eval mode
Epoch 0, global step 4: 'val_rmse' reached 1.17556 (best 1.17556), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=0-step=4.ckpt' as top 3
Epoch 0, global step 10: 'val_rmse' reached 0.99237 (best 0.99237), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=0-step=10.ckpt' as top 3
Epoch 1, global step 14: 'val_rmse' reached 1.44574 (best 0.99237), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=1-step=14.ckpt' as top 3
Epoch 1, global step 20: 'val_rmse' reached 0.97314 (best 0.97314), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=1-step=20.ckpt' as top 3
Epoch 2, global step 24: 'val_rmse' reached 0.98176 (best 0.97314), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=2-step=24.ckpt' as top 3
Epoch 2, global step 30: 'val_rmse' reached 0.83199 (best 0.83199), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=2-step=30.ckpt' as top 3
Epoch 3, global step 34: 'val_rmse' reached 0.87795 (best 0.83199), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=3-step=34.ckpt' as top 3
Time limit reached. Elapsed time is 0:03:00. Signaling Trainer to stop.
Epoch 3, global step 36: 'val_rmse' reached 0.87481 (best 0.83199), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction/epoch=3-step=36.ckpt' as top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/0c451c3848fc45bea986793feb21baa9-automm_text_book_price_prediction")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).

<autogluon.multimodal.predictor.MultiModalPredictor at 0x7fee472d1290>

Prediction¶

We can easily obtain predictions and extract data embeddings using the MultiModalPredictor.

predictions = predictor.predict(test_data)
print('Predictions:')
print('------------')
print(np.exp(predictions) - 1)
print()
print('True Value:')
print('------------')
print(np.exp(test_data['Price']) - 1)

Predictions:
------------
1     494.587433
31    485.123138
19    982.370972
45    566.517639
82    838.060669
Name: Price, dtype: float32

True Value:
------------
1     202.93
31    799.00
19    352.00
45    395.10
82    409.00
Name: Price, dtype: float64

performance = predictor.evaluate(test_data)
print(performance)

{'rmse': 0.7387054611052827}

embeddings = predictor.extract_embedding(test_data)
embeddings.shape

(5, 128)

Other Examples¶

You may go to AutoMM Examples to explore other examples about AutoMM.

Customization¶

To learn how to customize AutoMM, please refer to Customize AutoMM.