.. _sec_automm_multimodal_beginner:

AutoMM for Image + Text + Tabular - Quick Start
===============================================


AutoMM is a deep learning “model zoo” of model zoos. It can
automatically build deep learning models that are suitable for
multimodal datasets. You will only need to convert the data into the
multimodal dataframe format and AutoMM can predict the values of one
column conditioned on the features from the other columns including
images, text, and tabular data.

.. code:: python

    import os
    import numpy as np
    import warnings
    warnings.filterwarnings('ignore')
    np.random.seed(123)

Dataset
-------

For demonstration, we use a simplified and subsampled version of
`PetFinder
dataset <https://www.kaggle.com/c/petfinder-adoption-prediction>`__. The
task is to predict the animals’ adoption rates based on their adoption
profile information. In this simplified version, the adoption speed is
grouped into two categories: 0 (slow) and 1 (fast).

To get started, let’s download and prepare the dataset.

.. code:: python

    download_dir = './ag_automm_tutorial'
    zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'
    from autogluon.core.utils.loaders import load_zip
    load_zip.unzip(zip_file, unzip_dir=download_dir)


.. parsed-literal::
    :class: output

    Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip...


.. parsed-literal::
    :class: output

    100%|██████████| 18.8M/18.8M [00:00<00:00, 53.5MiB/s]


Next, we will load the CSV files.

.. code:: python

    import pandas as pd
    dataset_path = download_dir + '/petfinder_for_tutorial'
    train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
    test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)
    label_col = 'AdoptionSpeed'

We need to expand the image paths to load them in training.

.. code:: python

    image_col = 'Images'
    train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
    test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])
    
    
    def path_expander(path, base_folder):
        path_l = path.split(';')
        return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])
    
    train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
    test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
    
    train_data[image_col].iloc[0]


.. parsed-literal::
    :class: output

    '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial/petfinder_for_tutorial/images/7d7a39d71-1.jpg'


Each animal’s adoption profile includes pictures, a text description,
and various tabular features such as age, breed, name, color, and more.
Let’s look at an example row of data and display the text description
and a picture.

.. code:: python

    example_row = train_data.iloc[0]
    
    example_row


.. parsed-literal::
    :class: output

    Type                                                             2
    Name                                                 Yumi Hamasaki
    Age                                                              4
    Breed1                                                         292
    Breed2                                                         265
    Gender                                                           2
    Color1                                                           1
    Color2                                                           5
    Color3                                                           7
    MaturitySize                                                     2
    FurLength                                                        2
    Vaccinated                                                       1
    Dewormed                                                         3
    Sterilized                                                       2
    Health                                                           1
    Quantity                                                         1
    Fee                                                              0
    State                                                        41326
    RescuerID                         bcc4e1b9557a8b3aaf545ea8e6e86991
    VideoAmt                                                         0
    Description      I rescued Yumi Hamasaki at a food stall far aw...
    PetID                                                    7d7a39d71
    PhotoAmt                                                       3.0
    AdoptionSpeed                                                    0
    Images           /home/ci/autogluon/docs/_build/eval/tutorials/...
    Name: 0, dtype: object


.. code:: python

    example_row['Description']


.. parsed-literal::
    :class: output

    "I rescued Yumi Hamasaki at a food stall far away in Kelantan. At that time i was on my way back to KL, she was suffer from stomach problem and looking very2 sick.. I send her to vet & get the treatment + vaccinated and right now she's very2 healthy.. About yumi : - love to sleep with ppl - she will keep on meowing if she's hugry - very2 active, always seeking for people to accompany her playing - well trained (poo+pee in her own potty) - easy to bathing - I only feed her with these brands : IAMS, Kittenbites, Pro-formance Reason why i need someone to adopt Yumi: I just married and need to move to a new house where no pets are allowed :( As Yumi is very2 special to me, i will only give her to ppl that i think could take care of her just like i did (especially on her foods things).."


.. code:: python

    example_image = example_row[image_col]
    
    from IPython.display import Image, display
    pil_img = Image(filename=example_image)
    display(pil_img)


.. figure:: output_beginner_multimodal_808bd8_11_0.jpg


Training
--------

Now let’s fit the predictor with the training data. Here we set a tight
time budget for a quick demo.

.. code:: python

    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor(label=label_col)
    predictor.fit(
        train_data=train_data,
        time_limit=120, # seconds
    )


.. parsed-literal::
    :class: output

    INFO:pytorch_lightning.utilities.seed:Global seed set to 123
    Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22kto1k.pth" to /home/ci/.cache/torch/hub/checkpoints/swin_base_patch4_window7_224_22kto1k.pth


.. parsed-literal::
    :class: output

    Downloading /home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.zip from https://automl-mm-bench.s3.amazonaws.com/few_shot/templates.zip...


.. parsed-literal::
    :class: output

    INFO:pytorch_lightning.trainer.connectors.accelerator_connector:Auto select gpus: [0]
    INFO:pytorch_lightning.utilities.rank_zero:Using 16bit native Automatic Mixed Precision (AMP)
    INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
    INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
    INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
    INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
    INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    INFO:pytorch_lightning.callbacks.model_summary:
      | Name              | Type                | Params
    ----------------------------------------------------------
    0 | model             | MultimodalFusionMLP | 198 M 
    1 | validation_metric | AUROC               | 0     
    2 | loss_func         | CrossEntropyLoss    | 0     
    ----------------------------------------------------------
    198 M     Trainable params
    0         Non-trainable params
    198 M     Total params
    396.017   Total estimated model params size (MB)
    INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 1: 'val_roc_auc' reached 0.51333 (best 0.51333), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=0-step=1.ckpt' as top 3
    INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 4: 'val_roc_auc' reached 0.72417 (best 0.72417), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=0-step=4.ckpt' as top 3
    INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 5: 'val_roc_auc' reached 0.74083 (best 0.74083), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=1-step=5.ckpt' as top 3
    INFO:pytorch_lightning.utilities.rank_zero:Time limit reached. Elapsed time is 0:02:00. Signaling Trainer to stop.
    INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 5: 'val_roc_auc' reached 0.74083 (best 0.74083), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20221213_013859/epoch=1-step=5-v1.ckpt' as top 3


.. parsed-literal::
    :class: output

    <autogluon.multimodal.predictor.MultiModalPredictor at 0x7f16e0125cd0>


Under the hood, AutoMM automatically infers the problem type
(classification or regression), detects the data modalities, selects the
related models from the multimodal model pools, and trains the selected
models. If multiple backbones are available, AutoMM appends a
late-fusion model (MLP or transformer) on top of them.

Evaluation
----------

Then we can evaluate the predictor on the test data.

.. code:: python

    scores = predictor.evaluate(test_data, metrics=["roc_auc"])
    scores


.. parsed-literal::
    :class: output

    {'roc_auc': 0.8924}


Prediction
----------

Given a multimodal dataframe without the label column, we can predict
the labels.

.. code:: python

    predictions = predictor.predict(test_data.drop(columns=label_col))
    predictions[:5]


.. parsed-literal::
    :class: output

    8     0
    70    1
    82    1
    28    0
    63    1
    Name: AdoptionSpeed, dtype: int64


For classification tasks, we can get the probabilities of all classes.

.. code:: python

    probas = predictor.predict_proba(test_data.drop(columns=label_col))
    probas[:5]


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>8</th>
          <td>0.825185</td>
          <td>0.174815</td>
        </tr>
        <tr>
          <th>70</th>
          <td>0.069858</td>
          <td>0.930142</td>
        </tr>
        <tr>
          <th>82</th>
          <td>0.333829</td>
          <td>0.666171</td>
        </tr>
        <tr>
          <th>28</th>
          <td>0.767908</td>
          <td>0.232093</td>
        </tr>
        <tr>
          <th>63</th>
          <td>0.086093</td>
          <td>0.913907</td>
        </tr>
      </tbody>
    </table>
    </div>


Note that calling ``.predict_proba()`` on one regression task will throw
an exception.

Extract Embeddings
------------------

Extracting embeddings can also be useful in many cases, where we want to
convert each sample (per row in the dataframe) into an embedding vector.

.. code:: python

    embeddings = predictor.extract_embedding(test_data.drop(columns=label_col))
    embeddings.shape


.. parsed-literal::
    :class: output

    (100, 128)


Save and Load
-------------

It is also convenient to save a predictor and re-load it.

.. warning::

   ``MultiModalPredictor.load()`` used ``pickle`` module implicitly,
   which is known to be insecure. It is possible to construct malicious
   pickle data which will execute arbitrary code during unpickling.
   Never load data that could have come from an untrusted source, or
   that could have been tampered with. **Only load data you trust.**

.. code:: python

    import uuid
    
    model_path = f"./tmp/{uuid.uuid4().hex}-saved_model"
    predictor.save(model_path)
    loaded_predictor = MultiModalPredictor.load(model_path)
    scores2 = loaded_predictor.evaluate(test_data, metrics=["roc_auc"])
    scores2


.. parsed-literal::
    :class: output

    {'roc_auc': 0.8924}


Other Examples
--------------

You may go to `AutoMM
Examples <https://github.com/autogluon/autogluon/tree/master/examples/automm>`__
to explore other examples about AutoMM.

Customization
-------------

To learn how to customize AutoMM, please refer to
:ref:`sec_automm_customization`.