.. _sec_automm_imageclassification_beginner:

AutoMM for Image Classification - Quick Start
=============================================


In this quick start, we’ll use the task of image classification to
illustrate how to use **MultiModalPredictor**. Once the data is prepared
in `Pandas
DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`__
format, a single call to ``MultiModalPredictor.fit()`` will take care of
the model training for you.

Create Image Dataset
--------------------

For demonstration purposes, we use a subset of the `Shopee-IET
dataset <https://www.kaggle.com/c/shopee-iet-machine-learning-competition/data>`__
from Kaggle. Each image in this data depicts a clothing item and the
corresponding label specifies its clothing category. Our subset of the
data contains the following possible labels: ``BabyPants``,
``BabyShirt``, ``womencasualshoes``, ``womenchiffontop``.

We can load a dataset by downloading a url data automatically:

.. code:: python

    import warnings
    warnings.filterwarnings('ignore')
    from autogluon.vision import ImageDataset
    train_dataset, _, test_dataset = ImageDataset.from_folders("https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip")
    print(train_dataset)


.. parsed-literal::
    :class: output

    Downloading /home/ci/.gluoncv/archive/shopee-iet.zip from https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip...


.. parsed-literal::
    :class: output

    100%|██████████| 40895/40895 [00:01<00:00, 24350.69KB/s]


.. parsed-literal::
    :class: output

    data/
    ├── test/
    └── train/
                                                     image  label
    0    /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      0
    1    /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      0
    2    /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      0
    3    /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      0
    4    /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      0
    ..                                                 ...    ...
    795  /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      3
    796  /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      3
    797  /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      3
    798  /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      3
    799  /home/ci/.gluoncv/datasets/shopee-iet/data/tra...      3
    
    [800 rows x 2 columns]


We can see there are 800 rows and 2 columns in this training dataframe.
The 2 columns are **image** and **label**, and each row represents a
different training sample.

Use AutoMM to Fit Models
------------------------

Now, we fit a classifier using AutoMM as follows:

.. code:: python

    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor(label="label", path="./automm_imgcls")
    predictor.fit(
        train_data=train_dataset,
        time_limit=30, # seconds
    ) # you can trust the default config, e.g., we use a `swin_base_patch4_window7_224` model


.. parsed-literal::
    :class: output

    Global seed set to 123
    Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22kto1k.pth" to /home/ci/.cache/torch/hub/checkpoints/swin_base_patch4_window7_224_22kto1k.pth
    Auto select gpus: [0]
    Using 16bit native Automatic Mixed Precision (AMP)
    GPU available: True, used: True
    TPU available: False, using: 0 TPU cores
    IPU available: False, using: 0 IPUs
    HPU available: False, using: 0 HPUs
    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    
      | Name              | Type                            | Params
    ----------------------------------------------------------------------
    0 | model             | TimmAutoModelForImagePrediction | 86.7 M
    1 | validation_metric | Accuracy                        | 0     
    2 | loss_func         | CrossEntropyLoss                | 0     
    ----------------------------------------------------------------------
    86.7 M    Trainable params
    0         Non-trainable params
    86.7 M    Total params
    173.495   Total estimated model params size (MB)
    Epoch 0, global step 2: 'val_accuracy' reached 0.33125 (best 0.33125), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=0-step=2.ckpt' as top 3
    Epoch 0, global step 5: 'val_accuracy' reached 0.88750 (best 0.88750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=0-step=5.ckpt' as top 3
    Epoch 1, global step 7: 'val_accuracy' reached 0.92500 (best 0.92500), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=1-step=7.ckpt' as top 3
    Time limit reached. Elapsed time is 0:00:36. Signaling Trainer to stop.


.. parsed-literal::
    :class: output

    <autogluon.multimodal.predictor.MultiModalPredictor at 0x7f242a8596d0>


**label** is the name of the column that contains the target variable to
predict, e.g., it is “label” in our example. **path** indicates the
directory where models and intermediate outputs should be saved. We set
the training time limit to 30 seconds for demonstration purpose, but you
can control the training time by setting configurations. To customize
AutoMM, please refer to :ref:`sec_automm_customization`.

Evaluate on Test Dataset
------------------------

You can evaluate the classifier on the test dataset to see how it
performs, the test top-1 accuracy is:

.. code:: python

    scores = predictor.evaluate(test_dataset, metrics=["accuracy"])
    print('Top-1 test acc: %.3f' % scores["accuracy"])


.. parsed-literal::
    :class: output

    Top-1 test acc: 0.963


Predict on a New Image
----------------------

Given an example image, let’s visualize it first,

.. code:: python

    image_path = test_dataset.iloc[0]['image']
    from IPython.display import Image, display
    pil_img = Image(filename=image_path)
    display(pil_img)


.. figure:: output_beginner_image_cls_96f3fd_7_0.jpg


We can easily use the final model to ``predict`` the label,

.. code:: python

    predictions = predictor.predict({'image': [image_path]})
    print(predictions)


.. parsed-literal::
    :class: output

    [0]


If probabilities of all categories are needed, you can call
``predict_proba``:

.. code:: python

    proba = predictor.predict_proba({'image': [image_path]})
    print(proba)


.. parsed-literal::
    :class: output

    [[0.812597   0.16638389 0.00555061 0.01546856]]


Extract Embeddings
------------------

Extracting representation from the whole image learned by a model is
also very useful. We provide ``extract_embedding`` function to allow
predictor to return the N-dimensional image feature where ``N`` depends
on the model(usually a 512 to 2048 length vector)

.. code:: python

    feature = predictor.extract_embedding({'image': [image_path]})
    print(feature[0].shape)


.. parsed-literal::
    :class: output

    (1024,)


Save and Load
-------------

The trained predictor is automatically saved at the end of ``fit()``,
and you can easily reload it.

.. code:: python

    loaded_predictor = MultiModalPredictor.load('automm_imgcls')
    load_proba = loaded_predictor.predict_proba({'image': [image_path]})
    print(load_proba)


.. parsed-literal::
    :class: output

    [[0.812597   0.16638389 0.00555061 0.01546856]]


We can see the predicted class probabilities are still the same as
above, which means same model!

Other Examples
--------------

You may go to `AutoMM
Examples <https://github.com/awslabs/autogluon/tree/master/examples/automm>`__
to explore other examples about AutoMM.

Customization
-------------

To learn how to customize AutoMM, please refer to
:ref:`sec_automm_customization`.