.. _sec_automm_multimodal_beginner: AutoMM for Image + Text + Tabular - Quick Start =============================================== AutoMM is a deep learning “model zoo” of model zoos. It can automatically build deep learning models that are suitable for multimodal datasets. You will only need to convert the data into the multimodal dataframe format and AutoMM can predict the values of one column conditioned on the features from the other columns including images, text, and tabular data. .. code:: python import os import numpy as np import warnings warnings.filterwarnings('ignore') np.random.seed(123) Dataset ------- For demonstration, we use a simplified and subsampled version of `PetFinder dataset `__. The task is to predict the animals’ adoption rates based on their adoption profile information. In this simplified version, the adoption speed is grouped into two categories: 0 (slow) and 1 (fast). To get started, let’s download and prepare the dataset. .. code:: python download_dir = './ag_automm_tutorial' zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip' from autogluon.core.utils.loaders import load_zip load_zip.unzip(zip_file, unzip_dir=download_dir) .. parsed-literal:: :class: output Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip... .. parsed-literal:: :class: output 100%|██████████| 18.8M/18.8M [00:00<00:00, 53.5MiB/s] Next, we will load the CSV files. .. code:: python import pandas as pd dataset_path = download_dir + '/petfinder_for_tutorial' train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0) test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0) label_col = 'AdoptionSpeed' We need to expand the image paths to load them in training. .. code:: python image_col = 'Images' train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0]) def path_expander(path, base_folder): path_l = path.split(';') return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l]) train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path)) test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path)) train_data[image_col].iloc[0] .. parsed-literal:: :class: output '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial/petfinder_for_tutorial/images/7d7a39d71-1.jpg' Each animal’s adoption profile includes pictures, a text description, and various tabular features such as age, breed, name, color, and more. Let’s look at an example row of data and display the text description and a picture. .. code:: python example_row = train_data.iloc[0] example_row .. parsed-literal:: :class: output Type 2 Name Yumi Hamasaki Age 4 Breed1 292 Breed2 265 Gender 2 Color1 1 Color2 5 Color3 7 MaturitySize 2 FurLength 2 Vaccinated 1 Dewormed 3 Sterilized 2 Health 1 Quantity 1 Fee 0 State 41326 RescuerID bcc4e1b9557a8b3aaf545ea8e6e86991 VideoAmt 0 Description I rescued Yumi Hamasaki at a food stall far aw... PetID 7d7a39d71 PhotoAmt 3.0 AdoptionSpeed 0 Images /home/ci/autogluon/docs/_build/eval/tutorials/... Name: 0, dtype: object .. code:: python example_row['Description'] .. parsed-literal:: :class: output "I rescued Yumi Hamasaki at a food stall far away in Kelantan. At that time i was on my way back to KL, she was suffer from stomach problem and looking very2 sick.. I send her to vet & get the treatment + vaccinated and right now she's very2 healthy.. About yumi : - love to sleep with ppl - she will keep on meowing if she's hugry - very2 active, always seeking for people to accompany her playing - well trained (poo+pee in her own potty) - easy to bathing - I only feed her with these brands : IAMS, Kittenbites, Pro-formance Reason why i need someone to adopt Yumi: I just married and need to move to a new house where no pets are allowed :( As Yumi is very2 special to me, i will only give her to ppl that i think could take care of her just like i did (especially on her foods things).." .. code:: python example_image = example_row[image_col] from IPython.display import Image, display pil_img = Image(filename=example_image) display(pil_img) .. figure:: output_beginner_multimodal_808bd8_11_0.jpg Training -------- Now let’s fit the predictor with the training data. Here we set a tight time budget for a quick demo. .. code:: python from autogluon.multimodal import MultiModalPredictor predictor = MultiModalPredictor(label=label_col) predictor.fit( train_data=train_data, time_limit=120, # seconds ) .. parsed-literal:: :class: output INFO:pytorch_lightning.utilities.seed:Global seed set to 123 Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22kto1k.pth" to /home/ci/.cache/torch/hub/checkpoints/swin_base_patch4_window7_224_22kto1k.pth .. parsed-literal:: :class: output Downloading /home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.zip from https://automl-mm-bench.s3.amazonaws.com/few_shot/templates.zip... .. parsed-literal:: :class: output INFO:pytorch_lightning.trainer.connectors.accelerator_connector:Auto select gpus: [0] INFO:pytorch_lightning.utilities.rank_zero:Using 16bit native Automatic Mixed Precision (AMP) INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] INFO:pytorch_lightning.callbacks.model_summary: | Name | Type | Params ---------------------------------------------------------- 0 | model | MultimodalFusionMLP | 198 M 1 | validation_metric | AUROC | 0 2 | loss_func | CrossEntropyLoss | 0 ---------------------------------------------------------- 198 M Trainable params 0 Non-trainable params 198 M Total params 396.017 Total estimated model params size (MB) INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 1: 'val_roc_auc' reached 0.51333 (best 0.51333), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=0-step=1.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 4: 'val_roc_auc' reached 0.72417 (best 0.72417), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=0-step=4.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 5: 'val_roc_auc' reached 0.74083 (best 0.74083), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=1-step=5.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Time limit reached. Elapsed time is 0:02:00. Signaling Trainer to stop. INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 5: 'val_roc_auc' reached 0.74083 (best 0.74083), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=1-step=5-v1.ckpt' as top 3 .. parsed-literal:: :class: output Under the hood, AutoMM automatically infers the problem type (classification or regression), detects the data modalities, selects the related models from the multimodal model pools, and trains the selected models. If multiple backbones are available, AutoMM appends a late-fusion model (MLP or transformer) on top of them. Evaluation ---------- Then we can evaluate the predictor on the test data. .. code:: python scores = predictor.evaluate(test_data, metrics=["roc_auc"]) scores .. parsed-literal:: :class: output {'roc_auc': 0.8924} Prediction ---------- Given a multimodal dataframe without the label column, we can predict the labels. .. code:: python predictions = predictor.predict(test_data.drop(columns=label_col)) predictions[:5] .. parsed-literal:: :class: output 8 0 70 1 82 1 28 0 63 1 Name: AdoptionSpeed, dtype: int64 For classification tasks, we can get the probabilities of all classes. .. code:: python probas = predictor.predict_proba(test_data.drop(columns=label_col)) probas[:5] .. raw:: html
0 1
8 0.825185 0.174815
70 0.069858 0.930142
82 0.333829 0.666171
28 0.767908 0.232093
63 0.086093 0.913907
Note that calling ``.predict_proba()`` on one regression task will throw an exception. Extract Embeddings ------------------ Extracting embeddings can also be useful in many cases, where we want to convert each sample (per row in the dataframe) into an embedding vector. .. code:: python embeddings = predictor.extract_embedding(test_data.drop(columns=label_col)) embeddings.shape .. parsed-literal:: :class: output (100, 128) Save and Load ------------- It is also convenient to save a predictor and re-load it. .. warning:: ``MultiModalPredictor.load()`` used ``pickle`` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust.** .. code:: python import uuid model_path = f"./tmp/{uuid.uuid4().hex}-saved_model" predictor.save(model_path) loaded_predictor = MultiModalPredictor.load(model_path) scores2 = loaded_predictor.evaluate(test_data, metrics=["roc_auc"]) scores2 .. parsed-literal:: :class: output {'roc_auc': 0.8924} Other Examples -------------- You may go to `AutoMM Examples `__ to explore other examples about AutoMM. Customization ------------- To learn how to customize AutoMM, please refer to :ref:`sec_automm_customization`.