AutoMMPredictor for Image, Text, and Tabular¶
Are you tired of switching codebases or hacking code for different data
modalities (image, text, numerical, and categorical data) and tasks
(classification, regression, and more)? AutoMMPredictor
provides a
one-stop shop for multimodal/unimodal deep learning models. This
tutorial demonstrates several application scenarios.
Multimodal Prediction
CLIP
TIMM + Huggingface Transformers + More
Image Prediction
Text Prediction
Configuration Customization
APIs
import os
import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)
Dataset¶
For demonstration, we use the PetFinder dataset. The PetFinder dataset provides information about shelter animals that appear on their adoption profile to predict the animals’ adoption rates, grouped into five categories, hence a multi-class classification problem.
To get started, let’s download and prepare the dataset.
download_dir = './ag_automm_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)
Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip...
100%|██████████| 2.00G/2.00G [01:18<00:00, 25.5MiB/s]
Next, we will load the CSV files.
import pandas as pd
dataset_path = download_dir + '/petfinder_processed'
train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/dev.csv', index_col=0)
label_col = 'AdoptionSpeed'
We need to expand the image paths to load them in training.
image_col = 'Images'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])
def path_expander(path, base_folder):
path_l = path.split(';')
return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])
train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
train_data[image_col].iloc[0]
'/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_automm_tutorial/petfinder_processed/train_images/e4b90955c-1.jpg'
Each animal’s adoption profile includes pictures, a text description, and various tabular features such as age, breed, name, color, and more. Let’s look at an example row of data and display the text description and a picture.
example_row = train_data.iloc[47]
example_row
Type 2
Name Money
Age 4
Breed1 266
Breed2 0
Gender 2
Color1 1
Color2 2
Color3 7
MaturitySize 1
FurLength 2
Vaccinated 2
Dewormed 1
Sterilized 2
Health 1
Quantity 1
Fee 0
State 41401
RescuerID ee7445af32acfa1dc8307a9dc7baed21
VideoAmt 0
Description My pet is a pretty beautiful kitty which has a...
PetID 98c08df17
PhotoAmt 2.0
AdoptionSpeed 2
Images /var/lib/jenkins/workspace/workspace/autogluon...
Name: 14845, dtype: object
example_row['Description']
'My pet is a pretty beautiful kitty which has a mixed colour soft fur. She is active and full of life. And one thing about her, she loves to eat.She always turn on me like a tiger when I was preparing the food for her.'
example_image = example_row['Images']
from IPython.display import Image, display
pil_img = Image(filename=example_image)
display(pil_img)

For the demo purpose, we will sample 500 and 100 rows for training and testing, respectively.
train_data = train_data.sample(500, random_state=0)
test_data = test_data.sample(100, random_state=0)
Multimodal Prediction¶
CLIP¶
AutoMMPredictor
allows for finetuning the pre-trained vision
language models, such as
CLIP.
from autogluon.text.automm import AutoMMPredictor
predictor = AutoMMPredictor(label=label_col)
predictor.fit(
train_data=train_data,
hyperparameters={
"model.names": ["clip"],
"env.num_gpus": 1,
},
time_limit=120, # seconds
)
Global seed set to 123
Auto select gpus: [0]
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------------------
0 | model | CLIPForImageText | 151 M
1 | validation_metric | Accuracy | 0
2 | loss_func | CrossEntropyLoss | 0
-------------------------------------------------------
151 M Trainable params
0 Non-trainable params
151 M Total params
302.560 Total estimated model params size (MB)
Epoch 0, global step 1: 'val_accuracy' reached 0.27000 (best 0.27000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=0-step=1.ckpt' as top 3
Epoch 0, global step 4: 'val_accuracy' reached 0.30000 (best 0.30000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=0-step=4.ckpt' as top 3
Epoch 1, global step 5: 'val_accuracy' reached 0.29000 (best 0.30000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=1-step=5.ckpt' as top 3
Epoch 1, global step 8: 'val_accuracy' was not in top 3
Epoch 2, global step 9: 'val_accuracy' reached 0.29000 (best 0.30000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=2-step=9.ckpt' as top 3
Epoch 2, global step 12: 'val_accuracy' reached 0.33000 (best 0.33000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=2-step=12.ckpt' as top 3
Epoch 3, global step 13: 'val_accuracy' reached 0.33000 (best 0.33000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=3-step=13.ckpt' as top 3
Epoch 3, global step 16: 'val_accuracy' was not in top 3
Epoch 4, global step 17: 'val_accuracy' reached 0.33000 (best 0.33000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053217/epoch=4-step=17.ckpt' as top 3
Time limit reached. Elapsed time is 0:02:06. Signaling Trainer to stop.
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
<autogluon.text.automm.predictor.AutoMMPredictor at 0x7f809f10a940>
scores = predictor.evaluate(test_data, metrics=["accuracy"])
scores
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
{'accuracy': 0.28}
In this example, AutoMMPredictor
finetunes CLIP with the image,
text, and categorical (converted to text) data.
TIMM + Huggingface Transformers + More¶
In addtion to CLIP, AutoMMPredictor
can simultaneously finetune
various timm
backbones and huggingface
transformers. Moreover,
AutoMMPredictor
uses MLP for numerical data but converts categorical
data to text by default.
Let’s use AutoMMPredictor
to train a late fusion model including
CLIP,
swin_small_patch4_window7_224,
google/electra-small-discriminator,
a numerical MLP, and a fusion MLP.
from autogluon.text.automm import AutoMMPredictor
predictor = AutoMMPredictor(label=label_col)
predictor.fit(
train_data=train_data,
hyperparameters={
"model.names": ["clip", "timm_image", "hf_text", "numerical_mlp", "fusion_mlp"],
"model.timm_image.checkpoint_name": "swin_small_patch4_window7_224",
"model.hf_text.checkpoint_name": "google/electra-small-discriminator",
"env.num_gpus": 1,
},
time_limit=120, # seconds
)
Global seed set to 123
Auto select gpus: [0]
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------------
0 | model | MultimodalFusionMLP | 215 M
1 | validation_metric | Accuracy | 0
2 | loss_func | CrossEntropyLoss | 0
----------------------------------------------------------
215 M Trainable params
0 Non-trainable params
215 M Total params
430.576 Total estimated model params size (MB)
Epoch 0, global step 1: 'val_accuracy' reached 0.16000 (best 0.16000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053455/epoch=0-step=1.ckpt' as top 3
Epoch 0, global step 4: 'val_accuracy' reached 0.20000 (best 0.20000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053455/epoch=0-step=4.ckpt' as top 3
Epoch 1, global step 5: 'val_accuracy' reached 0.32000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053455/epoch=1-step=5.ckpt' as top 3
Epoch 1, global step 8: 'val_accuracy' was not in top 3
Epoch 2, global step 9: 'val_accuracy' reached 0.17000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053455/epoch=2-step=9.ckpt' as top 3
Time limit reached. Elapsed time is 0:02:00. Signaling Trainer to stop.
Epoch 2, global step 10: 'val_accuracy' reached 0.24000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053455/epoch=2-step=10.ckpt' as top 3
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
<autogluon.text.automm.predictor.AutoMMPredictor at 0x7f809f0d4430>
scores = predictor.evaluate(test_data, metrics=["accuracy"])
scores
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
{'accuracy': 0.35}
Image Prediction¶
If you want to use only image data or your tasks only have image data,
AutoMMPredictor
can help you finetune a wide range of
timm backbones,
such as
swin_small_patch4_window7_224.
from autogluon.text.automm import AutoMMPredictor
predictor = AutoMMPredictor(label=label_col)
predictor.fit(
train_data=train_data,
hyperparameters={
"model.names": ["timm_image"],
"model.timm_image.checkpoint_name": "swin_tiny_patch4_window7_224",
"env.num_gpus": 1,
},
time_limit=60, # seconds
)
Global seed set to 123
Auto select gpus: [0]
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
----------------------------------------------------------------------
0 | model | TimmAutoModelForImagePrediction | 27.5 M
1 | validation_metric | Accuracy | 0
2 | loss_func | CrossEntropyLoss | 0
----------------------------------------------------------------------
27.5 M Trainable params
0 Non-trainable params
27.5 M Total params
55.046 Total estimated model params size (MB)
Epoch 0, global step 1: 'val_accuracy' reached 0.23000 (best 0.23000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=0-step=1.ckpt' as top 3
Epoch 0, global step 4: 'val_accuracy' reached 0.31000 (best 0.31000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=0-step=4.ckpt' as top 3
Epoch 1, global step 5: 'val_accuracy' reached 0.32000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=1-step=5.ckpt' as top 3
Epoch 1, global step 8: 'val_accuracy' reached 0.35000 (best 0.35000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=1-step=8.ckpt' as top 3
Epoch 2, global step 9: 'val_accuracy' was not in top 3
Epoch 2, global step 12: 'val_accuracy' was not in top 3
Epoch 3, global step 13: 'val_accuracy' was not in top 3
Epoch 3, global step 16: 'val_accuracy' reached 0.34000 (best 0.35000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=3-step=16.ckpt' as top 3
Epoch 4, global step 17: 'val_accuracy' reached 0.36000 (best 0.36000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=4-step=17.ckpt' as top 3
Epoch 4, global step 20: 'val_accuracy' reached 0.36000 (best 0.36000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053807/epoch=4-step=20.ckpt' as top 3
Epoch 5, global step 21: 'val_accuracy' was not in top 3
Epoch 5, global step 24: 'val_accuracy' was not in top 3
Epoch 6, global step 25: 'val_accuracy' was not in top 3
Epoch 6, global step 28: 'val_accuracy' was not in top 3
Time limit reached. Elapsed time is 0:01:01. Signaling Trainer to stop.
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
<autogluon.text.automm.predictor.AutoMMPredictor at 0x7f803238f580>
Here AutoMMPredictor
uses only image data since model.names
only
include timm_image
.
scores = predictor.evaluate(test_data, metrics=["accuracy"])
scores
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
{'accuracy': 0.31}
Text Prediction¶
Similarly, you may be interested in only finetuning the text backbones from huggingface transformers, such as google/electra-small-discriminator.
from autogluon.text.automm import AutoMMPredictor
predictor = AutoMMPredictor(label=label_col)
predictor.fit(
train_data=train_data,
hyperparameters={
"model.names": ["hf_text"],
"model.hf_text.checkpoint_name": "google/electra-small-discriminator",
"env.num_gpus": 1,
},
time_limit=60, # seconds
)
Global seed set to 123
Auto select gpus: [0]
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------------------------------
0 | model | HFAutoModelForTextPrediction | 13.5 M
1 | validation_metric | Accuracy | 0
2 | loss_func | CrossEntropyLoss | 0
-------------------------------------------------------------------
13.5 M Trainable params
0 Non-trainable params
13.5 M Total params
26.969 Total estimated model params size (MB)
Epoch 0, global step 1: 'val_accuracy' reached 0.32000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=0-step=1.ckpt' as top 3
Epoch 0, global step 4: 'val_accuracy' reached 0.22000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=0-step=4.ckpt' as top 3
Epoch 1, global step 5: 'val_accuracy' reached 0.23000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=1-step=5.ckpt' as top 3
Epoch 1, global step 8: 'val_accuracy' was not in top 3
Epoch 2, global step 9: 'val_accuracy' was not in top 3
Epoch 2, global step 12: 'val_accuracy' reached 0.28000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=2-step=12.ckpt' as top 3
Epoch 3, global step 13: 'val_accuracy' reached 0.31000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=3-step=13.ckpt' as top 3
Epoch 3, global step 16: 'val_accuracy' reached 0.31000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=3-step=16.ckpt' as top 3
Epoch 4, global step 17: 'val_accuracy' was not in top 3
Epoch 4, global step 20: 'val_accuracy' reached 0.32000 (best 0.32000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=4-step=20.ckpt' as top 3
Epoch 5, global step 21: 'val_accuracy' reached 0.33000 (best 0.33000), saving model to '/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/AutogluonModels/ag-20220521_053918/epoch=5-step=21.ckpt' as top 3
Epoch 5, global step 24: 'val_accuracy' was not in top 3
Epoch 6, global step 25: 'val_accuracy' was not in top 3
Epoch 6, global step 28: 'val_accuracy' was not in top 3
Epoch 7, global step 29: 'val_accuracy' was not in top 3
Epoch 7, global step 32: 'val_accuracy' was not in top 3
Epoch 8, global step 33: 'val_accuracy' was not in top 3
Epoch 8, global step 36: 'val_accuracy' was not in top 3
Epoch 9, global step 37: 'val_accuracy' was not in top 3
Epoch 9, global step 40: 'val_accuracy' was not in top 3
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
<autogluon.text.automm.predictor.AutoMMPredictor at 0x7f8033517d90>
With only hf_text
in model.names
, AutoMMPredictor
automatically uses only text and categorical (converted to text) data.
scores = predictor.evaluate(test_data, metrics=["accuracy"])
scores
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
{'accuracy': 0.15}
Configuration Customization¶
The above examples have shown the flexibility of AutoMMPredictor
.
You may want to know how to customize configurations for your tasks.
Fortunately, AutoMMPredictor
has a user-friendly configuration
design.
First, let’s see the available model presets.
from autogluon.text.automm.presets import list_model_presets, get_preset
model_presets = list_model_presets()
model_presets
['fusion_mlp_image_text_tabular']
Currently, AutoMMPredictor
has only one model preset, from which we
can construct the predictor’s preset.
preset = get_preset(model_presets[0])
preset
{'model': 'fusion_mlp_image_text_tabular',
'data': 'default',
'optimization': 'adamw',
'environment': 'default'}
AutoMMPredictor
configurations consist of four parts: model
,
data
, optimization
, and environment
. You can convert the
preset to configurations to see the details.
from omegaconf import OmegaConf
from autogluon.text.automm.utils import get_config
config = get_config(preset)
print(OmegaConf.to_yaml(config))
model:
names:
- categorical_mlp
- numerical_mlp
- hf_text
- timm_image
- clip
- fusion_mlp
categorical_mlp:
hidden_size: 64
activation: leaky_relu
num_layers: 1
drop_rate: 0.1
normalization: layer_norm
data_types:
- categorical
categorical_transformer:
out_features: 192
d_token: 192
num_trans_blocks: 0
num_attn_heads: 8
residual_dropout: 0.0
attention_dropout: 0.2
ffn_dropout: 0.1
normalization: layer_norm
ffn_activation: reglu
head_activation: relu
data_types:
- categorical
numerical_mlp:
hidden_size: 128
activation: leaky_relu
num_layers: 1
drop_rate: 0.1
normalization: layer_norm
data_types:
- numerical
merge: concat
numerical_transformer:
out_features: 192
d_token: 192
num_trans_blocks: 0
num_attn_heads: 8
residual_dropout: 0.0
attention_dropout: 0.2
ffn_dropout: 0.1
normalization: layer_norm
ffn_activation: reglu
head_activation: relu
data_types:
- numerical
merge: concat
hf_text:
checkpoint_name: google/electra-base-discriminator
data_types:
- text
tokenizer_name: hf_auto
max_text_len: 512
insert_sep: true
text_segment_num: 2
stochastic_chunk: false
timm_image:
checkpoint_name: swin_base_patch4_window7_224
mix_choice: all_logits
data_types:
- image
train_transform_types:
- resize_shorter_side
- center_crop
val_transform_types:
- resize_shorter_side
- center_crop
image_norm: imagenet
image_size: 224
max_img_num_per_col: 2
clip:
checkpoint_name: openai/clip-vit-base-patch32
data_types:
- image
- text
train_transform_types:
- resize_shorter_side
- center_crop
val_transform_types:
- resize_shorter_side
- center_crop
image_norm: clip
image_size: 224
max_img_num_per_col: 2
tokenizer_name: clip
max_text_len: 77
insert_sep: false
text_segment_num: 1
stochastic_chunk: false
fusion_mlp:
weight: 0.1
adapt_in_features: max
hidden_sizes:
- 128
activation: leaky_relu
drop_rate: 0.1
normalization: layer_norm
data_types: null
fusion_transformer:
hidden_size: 192
n_blocks: 3
attention_n_heads: 8
adapt_in_features: max
attention_dropout: 0.2
residual_dropout: 0.0
ffn_dropout: 0.1
ffn_d_hidden: 192
normalization: layer_norm
ffn_activation: geglu
head_activation: relu
data_types: null
data:
image:
missing_value_strategy: skip
text: null
categorical:
minimum_cat_count: 100
maximum_num_cat: 20
convert_to_text: true
numerical:
convert_to_text: false
scaler_with_mean: true
scaler_with_std: true
optimization:
optim_type: adamw
learning_rate: 0.0001
weight_decay: 0.001
lr_choice: layerwise_decay
lr_decay: 0.8
lr_schedule: cosine_decay
max_epochs: 10
max_steps: -1
warmup_steps: 0.1
end_lr: 0
lr_mult: 1
patience: 10
val_check_interval: 0.5
top_k: 3
top_k_average_method: greedy_soup
efficient_finetune: null
env:
num_gpus: -1
num_nodes: 1
batch_size: 128
per_gpu_batch_size: 8
eval_batch_size_ratio: 4
per_gpu_batch_size_evaluation: null
precision: 16
num_workers: 2
num_workers_evaluation: 2
fast_dev_run: false
deterministic: false
auto_select_gpus: true
strategy: ddp_spawn
The model
config provides four model types: MLP for categorical data
(categorical_mlp), MLP for numerical data (numerical_mlp),
huggingface
transformers for text
data (hf_text),
timm for image
data (timm_image), clip for image+text data, and a MLP to fuse any
combinations of categorical_mlp, numerical_mlp, hf_text, and
timm_image (fusion_mlp). We can specify the model combinations by
setting model.names
. Moreover, we can use
model.hf_text.checkpoint_name
and
model.timm_image.checkpoint_name
to customize huggingface and timm
backbones.
The data
config defines some model-agnostic rules in preprocessing
data. Note that AutoMMPredictor
converts categorical data into text
by default.
The optimization
config has hyper-parameters for model training.
AutoMMPredictor
uses layer-wise learning rate decay, which decreases
the learning rate gradually from the output to the input end of one
model.
The env
config contains the environment/machine related
hyper-parameters. For example, the optimal values of
per_gpu_batch_size
and per_gpu_batch_size_evaluation
are closely
related to the GPU memory size.
You can flexibly customize any hyper-parameter in config
via the
hyperparameters
argument of .fit()
. To access one
hyper-parameter in config
, you need to traverse from top-level keys
to bottom-level keys and join them together with .
For example, if
you want to change the per GPU batch size to 16, you can set
hyperparameters={"env.per_gpu_batch_size": 16}
.
APIs¶
Besides .fit()
and .evaluate()
, AutoMMPredictor
also
provides other useful APIs, similar to those in TextPredictor
and
TabularPredictor
. You may refer to more details in
Text Prediction - Quick Start.
Given data without ground truth labels, AutoMMPredictor
can make
predictions.
predictions = predictor.predict(test_data.drop(columns=label_col))
predictions[:5]
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
1873 1
8536 4
7988 2
10127 4
14668 1
Name: AdoptionSpeed, dtype: int64
For classification tasks, we can get the probabilities of all classes.
probas = predictor.predict_proba(test_data.drop(columns=label_col))
probas[:5]
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1873 | 0.006065 | 0.277840 | 0.246935 | 0.194220 | 0.274939 |
8536 | 0.006870 | 0.242153 | 0.238050 | 0.228400 | 0.284526 |
7988 | 0.007646 | 0.202543 | 0.302092 | 0.261152 | 0.226567 |
10127 | 0.006175 | 0.246673 | 0.276513 | 0.185348 | 0.285291 |
14668 | 0.006827 | 0.310215 | 0.236518 | 0.164192 | 0.282247 |
Note that calling .predict_proba
on one regression task will throw
an exception.
Extract embeddings can be easily done via .extract_embedding()
.
embeddings = predictor.extract_embedding(test_data.drop(columns=label_col))
embeddings.shape
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
(100, 256)
It is also convenient to save and load a predictor.
predictor.save('my_saved_dir')
loaded_predictor = AutoMMPredictor.load('my_saved_dir')
scores2 = loaded_predictor.evaluate(test_data, metrics=["accuracy"])
scores2
Auto select gpus: [0]
HPU available: False, using: 0 HPUs
{'accuracy': 0.15}