Hyperparameter Optimization in AutoMM
=====================================

Hyperparameter optimization (HPO) is a method that helps solve the
challenge of tuning hyperparameters of machine learning models. ML
algorithms have multiple complex hyperparameters that generate an
enormous search space, and the search space in deep learning methods is
even larger than traditional ML algorithms. Tuning on a massive search
space is a tough challenge, but AutoMM provides various options for you
to guide the fitting process based on your domain knowledge and the
constraint on computing resources.

Create Image Dataset
--------------------

In this tutorial, we are going to again use the subset of the
`Shopee-IET
dataset <https://www.kaggle.com/c/shopee-iet-machine-learning-competition/data>`__
from Kaggle for demonstration purpose. Each image contains a clothing
item and the corresponding label specifies its clothing category. Our
subset of the data contains the following possible labels:
``BabyPants``, ``BabyShirt``, ``womencasualshoes``, ``womenchiffontop``.

We can load a dataset by downloading a url data automatically:

.. code:: python

    import warnings
    warnings.filterwarnings('ignore')
    from datetime import datetime
    
    from autogluon.multimodal.utils.misc import shopee_dataset
    download_dir = './ag_automm_tutorial_hpo'
    train_data, test_data = shopee_dataset(download_dir)
    train_data = train_data.sample(frac=0.5)
    print(train_data)


.. parsed-literal::
    :class: output

    Downloading ./ag_automm_tutorial_hpo/file.zip from https://automl-mm-bench.s3.amazonaws.com/vision_datasets/shopee.zip...


.. parsed-literal::
    :class: output

    100%|██████████| 41.9M/41.9M [00:00<00:00, 54.7MiB/s]


.. parsed-literal::
    :class: output

                                                     image  label
    777  /home/ci/autogluon/docs/_build/eval/tutorials/...      3
    342  /home/ci/autogluon/docs/_build/eval/tutorials/...      1
    599  /home/ci/autogluon/docs/_build/eval/tutorials/...      2
    762  /home/ci/autogluon/docs/_build/eval/tutorials/...      3
    16   /home/ci/autogluon/docs/_build/eval/tutorials/...      0
    ..                                                 ...    ...
    227  /home/ci/autogluon/docs/_build/eval/tutorials/...      1
    560  /home/ci/autogluon/docs/_build/eval/tutorials/...      2
    26   /home/ci/autogluon/docs/_build/eval/tutorials/...      0
    490  /home/ci/autogluon/docs/_build/eval/tutorials/...      2
    108  /home/ci/autogluon/docs/_build/eval/tutorials/...      0
    
    [400 rows x 2 columns]


There are in total 400 data points in this dataset. The ``image`` column
stores the path to the actual image, and the ``label`` column stands for
the label class.

The Regular Model Fitting
-------------------------

Recall that if we are to use the default settings predefined by
Autogluon, we can simply fit the model using ``MultiModalPredictor``
with three lines of code:

.. code:: python

    from autogluon.multimodal import MultiModalPredictor
    predictor_regular = MultiModalPredictor(label="label")
    start_time = datetime.now()
    predictor_regular.fit(
        train_data=train_data,
        hyperparameters = {"model.timm_image.checkpoint_name": "ghostnet_100"}
    )
    end_time = datetime.now()
    elapsed_seconds = (end_time - start_time).total_seconds()
    elapsed_min = divmod(elapsed_seconds, 60)
    print("Total fitting time: ", f"{int(elapsed_min[0])}m{int(elapsed_min[1])}s")


.. parsed-literal::
    :class: output

    Global seed set to 123
    No path specified. Models will be saved in: "AutogluonModels/ag-20230222_235343/"
    AutoMM starts to create your model. ✨
    
    - Model will be saved to "/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343".
    
    - Validation metric is "accuracy".
    
    - To track the learning progress, you can open a terminal and launch Tensorboard:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343
        ```
    
    Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai
    
    Downloading: "https://github.com/huawei-noah/CV-backbones/releases/download/ghostnet_pth/ghostnet_1x.pth" to /home/ci/.cache/torch/hub/checkpoints/ghostnet_1x.pth
    Using 16bit None Automatic Mixed Precision (AMP)
    GPU available: True (cuda), used: True
    TPU available: False, using: 0 TPU cores
    IPU available: False, using: 0 IPUs
    HPU available: False, using: 0 HPUs
    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    
      | Name              | Type                            | Params
    ----------------------------------------------------------------------
    0 | model             | TimmAutoModelForImagePrediction | 3.9 M 
    1 | validation_metric | Accuracy                        | 0     
    2 | loss_func         | CrossEntropyLoss                | 0     
    ----------------------------------------------------------------------
    3.9 M     Trainable params
    0         Non-trainable params
    3.9 M     Total params
    7.813     Total estimated model params size (MB)
    Epoch 0, global step 1: 'val_accuracy' reached 0.20000 (best 0.20000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=0-step=1.ckpt' as top 3
    Epoch 0, global step 3: 'val_accuracy' reached 0.22500 (best 0.22500), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=0-step=3.ckpt' as top 3
    Epoch 1, global step 4: 'val_accuracy' reached 0.30000 (best 0.30000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=1-step=4.ckpt' as top 3
    Epoch 1, global step 6: 'val_accuracy' reached 0.37500 (best 0.37500), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=1-step=6.ckpt' as top 3
    Epoch 2, global step 7: 'val_accuracy' reached 0.41250 (best 0.41250), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=2-step=7.ckpt' as top 3
    Epoch 2, global step 9: 'val_accuracy' reached 0.51250 (best 0.51250), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=2-step=9.ckpt' as top 3
    Epoch 3, global step 10: 'val_accuracy' reached 0.52500 (best 0.52500), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=3-step=10.ckpt' as top 3
    Epoch 3, global step 12: 'val_accuracy' reached 0.58750 (best 0.58750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=3-step=12.ckpt' as top 3
    Epoch 4, global step 13: 'val_accuracy' reached 0.56250 (best 0.58750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=4-step=13.ckpt' as top 3
    Epoch 4, global step 15: 'val_accuracy' reached 0.61250 (best 0.61250), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=4-step=15.ckpt' as top 3
    Epoch 5, global step 16: 'val_accuracy' reached 0.63750 (best 0.63750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=5-step=16.ckpt' as top 3
    Epoch 5, global step 18: 'val_accuracy' reached 0.62500 (best 0.63750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=5-step=18.ckpt' as top 3
    Epoch 6, global step 19: 'val_accuracy' reached 0.63750 (best 0.63750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=6-step=19.ckpt' as top 3
    Epoch 6, global step 21: 'val_accuracy' reached 0.63750 (best 0.63750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=6-step=21.ckpt' as top 3
    Epoch 7, global step 22: 'val_accuracy' was not in top 3
    Epoch 7, global step 24: 'val_accuracy' was not in top 3
    Epoch 8, global step 25: 'val_accuracy' was not in top 3
    Epoch 8, global step 27: 'val_accuracy' was not in top 3
    Epoch 9, global step 28: 'val_accuracy' reached 0.65000 (best 0.65000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343/epoch=9-step=28.ckpt' as top 3
    Epoch 9, global step 30: 'val_accuracy' was not in top 3
    `Trainer.fit` stopped: `max_epochs=10` reached.
    Start to fuse 3 checkpoints via the greedy soup algorithm.
    AutoMM has created your model 🎉🎉🎉
    
    - To load the model, use the code below:
        ```python
        from autogluon.multimodal import MultiModalPredictor
        predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343")
        ```
    
    - You can open a terminal and launch Tensorboard to visualize the training log:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235343
        ```
    
    - If you are not satisfied with the model, try to increase the training time, 
    adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
    or post issues on GitHub: https://github.com/autogluon/autogluon
    
    
.. parsed-literal::
    :class: output

    Total fitting time:  0m46s


Let’s check out the test accuracy of the fitted model:

.. code:: python

    scores = predictor_regular.evaluate(test_data, metrics=["accuracy"])
    print('Top-1 test acc: %.3f' % scores["accuracy"])


.. parsed-literal::
    :class: output

    Top-1 test acc: 0.575


Use HPO During Model Fitting
----------------------------

If you would like more control over the fitting process, you can specify
various options for hyperparameter optimizations(HPO) in
``MultiModalPredictor`` by simply adding more options in
``hyperparameter`` and ``hyperparameter_tune_kwargs``.

There are a few options we can have in MultiModalPredictor. We use `Ray
Tune <https://docs.ray.io/en/latest/tune/index.html>`__ ``tune`` library
in the backend, so we need to pass in a `Tune search
space <https://docs.ray.io/en/latest/tune/api_docs/search_space.html>`__
or an `AutoGluon search
space <https://auto.gluon.ai/dev/tutorials/course/core.html>`__ which
will be converted to Tune search space.

1. Defining the search space of various ``hyperparameter`` values for
   the training of neural networks:

.. raw:: html

   <ul>

::

   hyperparameters = {
           "optimization.learning_rate": tune.uniform(0.00005, 0.005),
           "optimization.optim_type": tune.choice(["adamw", "sgd"]),
           "optimization.max_epochs": tune.choice(["10", "20"]), 
           "model.timm_image.checkpoint_name": tune.choice(["swin_base_patch4_window7_224", "convnext_base_in22ft1k"])
           }

This is an example but not an exhaustive list. You can find the full
supported list in `Customize
AutoMM <https://auto.gluon.ai/stable/tutorials/multimodal/customization.html#sec-automm-customization>`__

.. raw:: html

   </ul>

2. Defining the search strategy for HPO with
   ``hyperparameter_tune_kwargs``. You can pass in a string or
   initialize a ``ray.tune.schedulers.TrialScheduler`` object.

.. raw:: html

   <ul>

a. Specifying how to search through your chosen hyperparameter space
   (supports ``random`` and ``bayes``):

::

   "searcher": "bayes"

.. raw:: html

   </ul>

.. raw:: html

   <ul>

b. Specifying how to schedule jobs to train a network under a particular
   hyperparameter configuration (supports ``FIFO`` and ``ASHA``):

::

   "scheduler": "ASHA"

.. raw:: html

   </ul>

.. raw:: html

   <ul>

c. Number of trials you would like to carry out HPO:

::

   "num_trials": 20

.. raw:: html

   </ul>

Let’s work on HPO with combinations of different learning rates and
backbone models:

.. code:: python

    from ray import tune
    
    predictor_hpo = MultiModalPredictor(label="label")
    
    hyperparameters = {
                "optimization.learning_rate": tune.uniform(0.00005, 0.001),
                "model.timm_image.checkpoint_name": tune.choice(["ghostnet_100",
                                                                 "mobilenetv3_large_100"])
    }
    hyperparameter_tune_kwargs = {
        "searcher": "bayes", # random
        "scheduler": "ASHA",
        "num_trials": 2,
    }
    start_time_hpo = datetime.now()
    predictor_hpo.fit(
            train_data=train_data,
            hyperparameters=hyperparameters,
            hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
        )
    end_time_hpo = datetime.now()
    elapsed_seconds_hpo = (end_time_hpo - start_time_hpo).total_seconds()
    elapsed_min_hpo = divmod(elapsed_seconds_hpo, 60)
    print("Total fitting time: ", f"{int(elapsed_min_hpo[0])}m{int(elapsed_min_hpo[1])}s")


.. parsed-literal::
    :class: output

    Global seed set to 123
    No path specified. Models will be saved in: "AutogluonModels/ag-20230222_235431/"
    /home/ci/opt/venv/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py:610: DeprecationWarning: `checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:
    
    from ray.air import session
    
    def train(config):
        # ...
        session.report({"metric": metric}, checkpoint=checkpoint)
    
    For more information please see https://docs.ray.io/en/master/tune/api_docs/trainable.html
    
      warnings.warn(


.. raw:: html

    <div class="tuneStatus">
      <div style="display: flex;flex-direction: row">
        <div style="display: flex;flex-direction: column;">
          <h3>Tune Status</h3>
          <table>
    <tbody>
    <tr><td>Current time:</td><td>2023-02-22 23:55:26</td></tr>
    <tr><td>Running for: </td><td>00:00:50.21        </td></tr>
    <tr><td>Memory:      </td><td>7.3/31.0 GiB       </td></tr>
    </tbody>
    </table>
        </div>
        <div class="vDivider"></div>
        <div class="systemInfo">
          <h3>System Info</h3>
          Using AsyncHyperBand: num_stopped=1<br>Bracket: Iter 4096.000: None | Iter 1024.000: None | Iter 256.000: None | Iter 64.000: None | Iter 16.000: 0.887499988079071 | Iter 4.000: 0.6718749925494194 | Iter 1.000: 0.22187499701976776<br>Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/12.45 GiB heap, 0.0/6.22 GiB objects (0.0/1.0 accelerator_type:T4)
        </div>
    
      </div>
      <div class="hDivider"></div>
      <div class="trialStatus">
        <h3>Trial Status</h3>
        <table>
    <thead>
    <tr><th>Trial name  </th><th>status    </th><th>loc           </th><th>model.names         </th><th>model.timm_image.che
    ckpoint_name                     </th><th style="text-align: right;">            optimization.learnin
    g_rate</th><th style="text-align: right;">  iter</th><th style="text-align: right;">  total time (s)</th><th style="text-align: right;">  val_accuracy</th></tr>
    </thead>
    <tbody>
    <tr><td>26a4dd62    </td><td>TERMINATED</td><td>10.0.0.34:2576</td><td>(&#x27;categorical_m_fa00</td><td>mobilenetv3_lar_3170</td><td style="text-align: right;">0.000637124</td><td style="text-align: right;">    20</td><td style="text-align: right;">        35.0547 </td><td style="text-align: right;">        0.9125</td></tr>
    <tr><td>20518fb6    </td><td>TERMINATED</td><td>10.0.0.34:2576</td><td>(&#x27;categorical_m_b300</td><td>mobilenetv3_lar_3170</td><td style="text-align: right;">0.000128319</td><td style="text-align: right;">     4</td><td style="text-align: right;">         6.18163</td><td style="text-align: right;">        0.4   </td></tr>
    </tbody>
    </table>
      </div>
    </div>
    <style>
    .tuneStatus {
      color: var(--jp-ui-font-color1);
    }
    .tuneStatus .systemInfo {
      display: flex;
      flex-direction: column;
    }
    .tuneStatus td {
      white-space: nowrap;
    }
    .tuneStatus .trialStatus {
      display: flex;
      flex-direction: column;
    }
    .tuneStatus h3 {
      font-weight: bold;
    }
    .tuneStatus .hDivider {
      border-bottom-width: var(--jp-border-width);
      border-bottom-color: var(--jp-border-color0);
      border-bottom-style: solid;
    }
    .tuneStatus .vDivider {
      border-left-width: var(--jp-border-width);
      border-left-color: var(--jp-border-color0);
      border-left-style: solid;
      margin: 0.5em 1em 0.5em 1em;
    }
    </style>


.. raw:: html

    <div class="trialProgress">
      <h3>Trial Progress</h3>
      <table>
    <thead>
    <tr><th>Trial name  </th><th>should_checkpoint  </th><th style="text-align: right;">  val_accuracy</th></tr>
    </thead>
    <tbody>
    <tr><td>20518fb6    </td><td>True               </td><td style="text-align: right;">        0.4   </td></tr>
    <tr><td>26a4dd62    </td><td>True               </td><td style="text-align: right;">        0.9125</td></tr>
    </tbody>
    </table>
    </div>
    <style>
    .trialProgress {
      display: flex;
      flex-direction: column;
      color: var(--jp-ui-font-color1);
    }
    .trialProgress h3 {
      font-weight: bold;
    }
    .trialProgress td {
      white-space: nowrap;
    }
    </style>


.. parsed-literal::
    :class: output

    Removing non-optimal trials and only keep the best one.
    Start to fuse 3 checkpoints via the greedy soup algorithm.


.. parsed-literal::
    :class: output

    Total fitting time:  0m57s


Let’s check out the test accuracy of the fitted model after HPO:

.. code:: python

    scores_hpo = predictor_hpo.evaluate(test_data, metrics=["accuracy"])
    print('Top-1 test acc: %.3f' % scores_hpo["accuracy"])


.. parsed-literal::
    :class: output

    Top-1 test acc: 0.812


From the training log, you should be able to see the current best trial
as below:

::

   Current best trial: 47aef96a with val_accuracy=0.862500011920929 and parameters={'optimization.learning_rate': 0.0007195214018085505, 'model.timm_image.checkpoint_name': 'ghostnet_100'}

After our simple 2-trial HPO run, we got a better test accuracy, by
searching different learning rates and models, compared to the
out-of-box solution provided in the previous section. HPO helps select
the combination of hyperparameters with highest validation accuracy.

Other Examples
--------------

You may go to `AutoMM
Examples <https://github.com/autogluon/autogluon/tree/master/examples/automm>`__
to explore other examples about AutoMM.

Customization
-------------

To learn how to customize AutoMM, please refer to
:ref:`sec_automm_customization`.