Multimodal Data Tables: Combining BERT/Transformers and Classical Tabular Models¶
Tip: If your data contains images, consider also checking out Multimodal Data Tables: Tabular, Text, and Image which handles images in addition to text and tabular features.
Here we introduce how to use AutoGluon Tabular to deal with multimodal
tabular data that contains text, numeric, and categorical columns. In
AutoGluon, raw text data is considered as a first-class citizen of
data tables. AutoGluon Tabular can help you train and combine a diverse
set of models including classical tabular models like
LightGBM/RF/CatBoost as well as our pretrained NLP model based
multimodal network that is introduced in Section
“What’s happening inside?” of
Text Prediction - Multimodal Table with Text (used by AutoGluon’s
TextPredictor
).
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pprint
import random
from autogluon.tabular import TabularPredictor
np.random.seed(123)
random.seed(123)
Product Sentiment Analysis Dataset¶
We consider the product sentiment analysis dataset from a MachineHack hackathon. The goal is to predict a user’s sentiment towards a product given their review (raw text) and a categorical feature indicating the product’s type (e.g., Tablet, Mobile, etc.). We have already split the original dataset to be 90% for training and 10% for development/testing (if submitting your models to the hackathon, we recommend training them on 100% of the dataset).
!mkdir -p product_sentiment_machine_hack
!wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/train.csv -O product_sentiment_machine_hack/train.csv
!wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/dev.csv -O product_sentiment_machine_hack/dev.csv
!wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/test.csv -O product_sentiment_machine_hack/test.csv
--2022-05-21 05:32:15-- https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/train.csv
Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 54.231.200.177
Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|54.231.200.177|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689486 (673K) [text/csv]
Saving to: ‘product_sentiment_machine_hack/train.csv’
product_sentiment_m 100%[===================>] 673.33K 1.73MB/s in 0.4s
2022-05-21 05:32:16 (1.73 MB/s) - ‘product_sentiment_machine_hack/train.csv’ saved [689486/689486]
--2022-05-21 05:32:17-- https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/dev.csv
Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 52.216.178.243
Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|52.216.178.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75517 (74K) [text/csv]
Saving to: ‘product_sentiment_machine_hack/dev.csv’
product_sentiment_m 100%[===================>] 73.75K --.-KB/s in 0.1s
2022-05-21 05:32:17 (608 KB/s) - ‘product_sentiment_machine_hack/dev.csv’ saved [75517/75517]
--2022-05-21 05:32:18-- https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/test.csv
Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 52.216.178.243
Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|52.216.178.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 312194 (305K) [text/csv]
Saving to: ‘product_sentiment_machine_hack/test.csv’
product_sentiment_m 100%[===================>] 304.88K 1.23MB/s in 0.2s
2022-05-21 05:32:18 (1.23 MB/s) - ‘product_sentiment_machine_hack/test.csv’ saved [312194/312194]
subsample_size = 2000 # for quick demo, try setting to larger values
feature_columns = ['Product_Description', 'Product_Type']
label = 'Sentiment'
train_df = pd.read_csv('product_sentiment_machine_hack/train.csv', index_col=0).sample(2000, random_state=123)
dev_df = pd.read_csv('product_sentiment_machine_hack/dev.csv', index_col=0)
test_df = pd.read_csv('product_sentiment_machine_hack/test.csv', index_col=0)
train_df = train_df[feature_columns + [label]]
dev_df = dev_df[feature_columns + [label]]
test_df = test_df[feature_columns]
print('Number of training samples:', len(train_df))
print('Number of dev samples:', len(dev_df))
print('Number of test samples:', len(test_df))
Number of training samples: 2000
Number of dev samples: 637
Number of test samples: 2728
There are two features in the dataset: the users’ review of the product and the product’s type, and four possible classes to predict.
train_df.head()
Product_Description | Product_Type | Sentiment | |
---|---|---|---|
4532 | they took away the lego pit but replaced it wi... | 0 | 1 |
1831 | #Apple to Open Pop-Up Shop at #SXSW [REPORT]: ... | 9 | 2 |
3536 | RT @mention False Alarm: Google Circles Not Co... | 5 | 1 |
5157 | Will Google reveal a new social network called... | 9 | 2 |
4643 | Niceness RT @mention Less than 2 hours until w... | 6 | 3 |
dev_df.head()
Product_Description | Product_Type | Sentiment | |
---|---|---|---|
3170 | Do it. RT @mention Come party w/ Google tonigh... | 3 | 3 |
6301 | Line for iPads at #SXSW. Doesn't look too bad!... | 6 | 3 |
5643 | First up: iPad Design Headaches (2 Tablets, Ca... | 6 | 2 |
1953 | #SXSW: Mint Talks Mobile App Development Chall... | 9 | 2 |
2658 | ÛÏ@mention Apple store downtown Austin open t... | 9 | 2 |
test_df.head()
Product_Description | Product_Type | |
---|---|---|
Text_ID | ||
5786 | RT @mention Going to #SXSW? The new iPhone gui... | 7 |
5363 | RT @mention 95% of iPhone and Droid apps have ... | 9 |
6716 | RT @mention Thank you to @mention for letting ... | 9 |
4339 | #Thanks @mention we're lovin' the @mention app... | 7 |
66 | At #sxsw? @mention / @mention wanna buy you a ... | 9 |
AutoGluon Tabular with Multimodal Support¶
To utilize the TextPredictor
model inside of TabularPredictor
,
we must specify the hyperparameters = 'multimodal'
in AutoGluon
Tabular. Internally, this will train multiple tabular models as well as
the TextPredictor model, and then combine them via either a weighted
ensemble or stack ensemble, as explained in AutoGluon Tabular
Paper. If you do not specify
hyperparameters = 'multimodal'
, then AutoGluon Tabular will simply
featurize text fields using N-grams and train only tabular models (which
may work better if your text is mostly uncommon strings/vocabulary).
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='Sentiment', path='ag_tabular_product_sentiment_multimodal')
predictor.fit(train_df, hyperparameters='multimodal')
Beginning AutoGluon training ... AutoGluon will save models to "ag_tabular_product_sentiment_multimodal/" AutoGluon Version: 0.4.1b20220521 Python Version: 3.9.12 Operating System: Linux Train Data Rows: 2000 Train Data Columns: 2 Label Column: Sentiment Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed). 4 unique label values: [1, 2, 3, 0] If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Train Data Class Count: 4 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 22340.45 MB Train Data (Original) Memory Usage: 0.34 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting IdentityFeatureGenerator... Fitting RenameFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Fitting TextSpecialFeatureGenerator... Fitting BinnedFeatureGenerator... Fitting DropDuplicatesFeatureGenerator... Fitting TextNgramFeatureGenerator... Fitting CountVectorizer for text features: ['Product_Description'] CountVectorizer fit with vocabulary size = 230 Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 1 | ['Product_Type'] ('object', ['text']) : 1 | ['Product_Description'] Types of features in processed data (raw dtype, special dtypes): ('category', ['text_as_category']) : 1 | ['Product_Description'] ('int', []) : 1 | ['Product_Type'] ('int', ['binned', 'text_special']) : 30 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...] ('int', ['text_ngram']) : 231 | ['__nlp__.about', '__nlp__.all', '__nlp__.amp', '__nlp__.an', '__nlp__.an ipad', ...] ('object', ['text']) : 1 | ['Product_Description_raw_text'] 0.5s = Fit runtime 2 features in original data used to generate 264 features in processed data. Train Data (Processed) Memory Usage: 1.34 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.58s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 1600, Val Rows: 400 Fitting 9 L1 models ... Fitting model: LightGBM ... 0.8925 = Validation score (accuracy) 2.28s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMXT ... 0.8625 = Validation score (accuracy) 1.1s = Training runtime 0.01s = Validation runtime Fitting model: CatBoost ... 0.8825 = Validation score (accuracy) 1.12s = Training runtime 0.01s = Validation runtime Fitting model: XGBoost ... 0.8925 = Validation score (accuracy) 1.71s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetTorch ... 0.8825 = Validation score (accuracy) 1.91s = Training runtime 0.02s = Validation runtime Fitting model: VowpalWabbit ... 0.675 = Validation score (accuracy) 0.69s = Training runtime 0.04s = Validation runtime Fitting model: LightGBMLarge ... 0.88 = Validation score (accuracy) 3.39s = Training runtime 0.02s = Validation runtime Fitting model: TextPredictor ... Global seed set to 0 Auto select gpus: [0] Using 16bit native Automatic Mixed Precision (AMP) GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ------------------------------------------------------------------- 0 | model | HFAutoModelForTextPrediction | 13.5 M 1 | validation_metric | Accuracy | 0 2 | loss_func | CrossEntropyLoss | 0 ------------------------------------------------------------------- 13.5 M Trainable params 0 Non-trainable params 13.5 M Total params 26.968 Total estimated model params size (MB) Epoch 0, global step 6: 'val_accuracy' reached 0.59000 (best 0.59000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=0-step=6.ckpt' as top 3 Epoch 0, global step 13: 'val_accuracy' reached 0.45500 (best 0.59000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=0-step=13.ckpt' as top 3 Epoch 1, global step 19: 'val_accuracy' reached 0.56500 (best 0.59000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=1-step=19.ckpt' as top 3 Epoch 1, global step 26: 'val_accuracy' reached 0.86250 (best 0.86250), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=1-step=26.ckpt' as top 3 Epoch 2, global step 32: 'val_accuracy' reached 0.88000 (best 0.88000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=2-step=32.ckpt' as top 3 Epoch 2, global step 39: 'val_accuracy' reached 0.88000 (best 0.88000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=2-step=39.ckpt' as top 3 Epoch 3, global step 45: 'val_accuracy' reached 0.88000 (best 0.88000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=3-step=45.ckpt' as top 3 Epoch 3, global step 52: 'val_accuracy' was not in top 3 Epoch 4, global step 58: 'val_accuracy' reached 0.88500 (best 0.88500), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=4-step=58.ckpt' as top 3 Epoch 4, global step 65: 'val_accuracy' was not in top 3 Epoch 5, global step 71: 'val_accuracy' was not in top 3 Epoch 5, global step 78: 'val_accuracy' was not in top 3 Epoch 6, global step 84: 'val_accuracy' reached 0.88750 (best 0.88750), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=6-step=84.ckpt' as top 3 Epoch 6, global step 91: 'val_accuracy' was not in top 3 Epoch 7, global step 97: 'val_accuracy' was not in top 3 Epoch 7, global step 104: 'val_accuracy' reached 0.88250 (best 0.88750), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_tabular_product_sentiment_multimodal/models/TextPredictor/epoch=7-step=104.ckpt' as top 3 Epoch 8, global step 110: 'val_accuracy' was not in top 3 Epoch 8, global step 117: 'val_accuracy' was not in top 3 Epoch 9, global step 123: 'val_accuracy' was not in top 3 Epoch 9, global step 130: 'val_accuracy' was not in top 3 Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs 0.8875 = Validation score (accuracy) 122.26s = Training runtime 0.46s = Validation runtime Fitting model: ImagePredictor ... No valid features to train ImagePredictor... Skipping this model. Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs Fitting model: WeightedEnsemble_L2 ... 0.9025 = Validation score (accuracy) 0.22s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 138.18s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("ag_tabular_product_sentiment_multimodal/")
<autogluon.tabular.predictor.predictor.TabularPredictor at 0x7fc12c4f9910>
predictor.leaderboard(dev_df)
Auto select gpus: [0] /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1. rank_zero_warn( HPU available: False, using: 0 HPUs
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 TextPredictor 0.894819 0.8875 1.403968 0.456156 122.263481 1.403968 0.456156 122.263481 1 True 8
1 WeightedEnsemble_L2 0.888540 0.9025 1.621594 0.532118 129.074454 0.004630 0.000572 0.223867 2 True 9
2 NeuralNetTorch 0.886970 0.8825 0.026558 0.018645 1.911242 0.026558 0.018645 1.911242 1 True 5
3 LightGBMLarge 0.886970 0.8800 0.068431 0.015841 3.391008 0.068431 0.015841 3.391008 1 True 7
4 CatBoost 0.885400 0.8825 0.017163 0.014147 1.118952 0.017163 0.014147 1.118952 1 True 3
5 LightGBM 0.885400 0.8925 0.036347 0.012755 2.280776 0.036347 0.012755 2.280776 1 True 1
6 XGBoost 0.882261 0.8925 0.047641 0.007398 1.708595 0.047641 0.007398 1.708595 1 True 4
7 LightGBMXT 0.869702 0.8625 0.012671 0.006666 1.098778 0.012671 0.006666 1.098778 1 True 2
8 VowpalWabbit 0.714286 0.6750 0.102450 0.036592 0.686492 0.102450 0.036592 0.686492 1 True 6
model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | TextPredictor | 0.894819 | 0.8875 | 1.403968 | 0.456156 | 122.263481 | 1.403968 | 0.456156 | 122.263481 | 1 | True | 8 |
1 | WeightedEnsemble_L2 | 0.888540 | 0.9025 | 1.621594 | 0.532118 | 129.074454 | 0.004630 | 0.000572 | 0.223867 | 2 | True | 9 |
2 | NeuralNetTorch | 0.886970 | 0.8825 | 0.026558 | 0.018645 | 1.911242 | 0.026558 | 0.018645 | 1.911242 | 1 | True | 5 |
3 | LightGBMLarge | 0.886970 | 0.8800 | 0.068431 | 0.015841 | 3.391008 | 0.068431 | 0.015841 | 3.391008 | 1 | True | 7 |
4 | CatBoost | 0.885400 | 0.8825 | 0.017163 | 0.014147 | 1.118952 | 0.017163 | 0.014147 | 1.118952 | 1 | True | 3 |
5 | LightGBM | 0.885400 | 0.8925 | 0.036347 | 0.012755 | 2.280776 | 0.036347 | 0.012755 | 2.280776 | 1 | True | 1 |
6 | XGBoost | 0.882261 | 0.8925 | 0.047641 | 0.007398 | 1.708595 | 0.047641 | 0.007398 | 1.708595 | 1 | True | 4 |
7 | LightGBMXT | 0.869702 | 0.8625 | 0.012671 | 0.006666 | 1.098778 | 0.012671 | 0.006666 | 1.098778 | 1 | True | 2 |
8 | VowpalWabbit | 0.714286 | 0.6750 | 0.102450 | 0.036592 | 0.686492 | 0.102450 | 0.036592 | 0.686492 | 1 | True | 6 |
Improve the Performance with Stack Ensemble¶
You can improve predictive performance by using stack ensembling. One way to turn it on is as follows:
predictor.fit(train_df, hyperparameters='multimodal', num_bag_folds=5, num_stack_levels=1)
or using:
predictor.fit(train_df, hyperparameters='multimodal', presets='best_quality')
which will automatically select values for num_stack_levels
(how
many stacking layers) and num_bag_folds
(how many folds to split
data into during bagging). Stack ensembling can take much longer, so we
won’t run with this configuration here. You may explore more examples in
https://github.com/awslabs/autogluon/tree/master/examples/text_prediction,
which demonstrate how you can achieve top performance in competitions
with a stack ensembling based solution.