AutoMM for Multimodal Named Entity Extraction
=============================================

We have introduced how to train an entity extraction model with text
data. Here, we move a step further by integrating data of other
modalities. In many real-world applications, textual data usually comes
with data of other modalities. For example, Twitter allows you to
compose tweets with text, photos, videos, and GIFs. Amazon.com uses
text, images, and videos to describe their products. These auxiliary
modalities can be leveraged as additional context resolution of
entities. Now, with AutoMM, you can easily exploit multimodal data to
enhance entity extraction without worrying about the details.

.. code:: python

    import os
    import pandas as pd
    import warnings
    warnings.filterwarnings('ignore')

Get the Twitter Dataset
-----------------------

In the following example, we will demonstrate how to build a multimodal
named entity recognition model with a real-world `Twitter
dataset <https://github.com/jefferyYu/UMT/tree/master>`__. This dataset
consists of scrapped tweets from 2016 to 2017, and each tweet was
composed of one sentence and one image. Let’s download the dataset.

.. code:: python

    download_dir = './ag_automm_tutorial_ner'
    zip_file = 'https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip'
    from autogluon.core.utils.loaders import load_zip
    load_zip.unzip(zip_file, unzip_dir=download_dir)


.. parsed-literal::
    :class: output

    Downloading ./ag_automm_tutorial_ner/file.zip from https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip...


.. parsed-literal::
    :class: output

    100%|██████████| 423M/423M [00:06<00:00, 64.8MiB/s]


Next, we will load the CSV files.

.. code:: python

    dataset_path = download_dir + '/multimodal_ner'
    train_data = pd.read_csv(f'{dataset_path}/twitter17_train.csv')
    test_data = pd.read_csv(f'{dataset_path}/twitter17_test.csv')
    label_col = 'entity_annotations'

We need to expand the image paths to load them in training.

.. code:: python

    image_col = 'image'
    train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
    test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])
    
    def path_expander(path, base_folder):
    	path_l = path.split(';')
    	p = ';'.join([os.path.abspath(base_folder+path) for path in path_l])
    	return p
    
    train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
    test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
    
    train_data[image_col].iloc[0]


.. parsed-literal::
    :class: output

    '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial_ner/multimodal_ner/twitter2017_images/17_06_1818.jpg'


Each row consists of the text and image of a single tweet and the
entity_annotataions which contains the named entity annotations for the
text column. Let’s look at an example row and display the text and
picture of the tweet.

.. code:: python

    example_row = train_data.iloc[0]
    
    example_row


.. parsed-literal::
    :class: output

    text_snippet           Uefa Super Cup : Real Madrid v Manchester United
    image                 /home/ci/autogluon/docs/_build/eval/tutorials/...
    entity_annotations    [{"entity_group": "B-MISC", "start": 0, "end":...
    Name: 0, dtype: object


Below is the image of this tweet.

.. code:: python

    example_image = example_row[image_col]
    
    from IPython.display import Image, display
    pil_img = Image(filename=example_image, width =300)
    display(pil_img)


.. figure:: output_multimodal_ner_41ca46_11_0.jpg
   :width: 300px


As you can see, this photo contains the logos of the Real Madrid
football club, Manchester United football club, and the UEFA super cup.
Clearly, the key information of the tweet sentence is coded here in a
different modality.

Training
--------

Now let’s fit the predictor with the training data. Firstly, we need to
specify the problem_type to **ner**. As our annotations are used for
text columns, to ensure the model to locate the correct text column for
entity extraction, we need to set the corresponding column type to
``text_ner`` using the **column_types** parameter in cases where
multiple text columns are present. Here we set a tight time budget for a
quick demo.

.. code:: python

    from autogluon.multimodal import MultiModalPredictor
    import uuid
    
    label_col = "entity_annotations"
    model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner"
    predictor = MultiModalPredictor(problem_type="ner", label=label_col, path=model_path)
    predictor.fit(
    	train_data=train_data,
    	column_types={"text_snippet":"text_ner"},
    	time_limit=300, #second
    )


.. parsed-literal::
    :class: output

    AutoMM starts to create your model. ✨
    
    - Model will be saved to "/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/7f6a36d92eec495698878ac82c94587d-automm_multimodal_ner".
    
    - Validation metric is "ner_token_f1".
    
    - To track the learning progress, you can open a terminal and launch Tensorboard:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/7f6a36d92eec495698878ac82c94587d-automm_multimodal_ner
        ```
    
    Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai
    
    Start to fuse 3 checkpoints via the greedy soup algorithm.


.. parsed-literal::
    :class: output

    Downloading builder script:   0%|          | 0.00/6.34k [00:00<?, ?B/s]


.. parsed-literal::
    :class: output

    AutoMM has created your model 🎉🎉🎉
    
    - To load the model, use the code below:
        ```python
        from autogluon.multimodal import MultiModalPredictor
        predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/7f6a36d92eec495698878ac82c94587d-automm_multimodal_ner")
        ```
    
    - You can open a terminal and launch Tensorboard to visualize the training log:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/7f6a36d92eec495698878ac82c94587d-automm_multimodal_ner
        ```
    
    - If you are not satisfied with the model, try to increase the training time, 
    adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
    or post issues on GitHub: https://github.com/autogluon/autogluon
    
    
.. parsed-literal::
    :class: output

    <autogluon.multimodal.predictor.MultiModalPredictor at 0x7fb17b3d9730>


Under the hood, AutoMM automatically detects the data modalities,
selects the related models from the multimodal model pools, and trains
the selected models. If multiple backbones are available, AutoMM appends
a late-fusion model on top of them.

Evaluation
----------

.. code:: python

    predictor.evaluate(test_data,  metrics=['overall_recall', "overall_precision", "overall_f1"])


.. parsed-literal::
    :class: output

    {'overall_recall': 0.533678756476684,
     'overall_precision': 0.5457986373959122,
     'overall_f1': 0.5396706586826348}


Prediction
----------

You can easily obtain the predictions by calling predictor.predict().

.. code:: python

    prediction_input = test_data.drop(columns=label_col).head(1)
    predictions = predictor.predict(prediction_input)
    print('Tweet:', prediction_input.text_snippet[0])
    print('Image path:', prediction_input.image[0])
    print('Predicted entities:', predictions[0])
    
    for entity in predictions[0]:
    	print(f"Word '{prediction_input.text_snippet[0][entity['start']:entity['end']]}' belongs to group: {entity['entity_group']}")


.. parsed-literal::
    :class: output

    Tweet: Citifield Fan View : RT @ jehnnybgoode What a gorgeous day for baseball ! Stuck in that Saturdaze . # NewYorkMets VS # Sa …
    Image path: /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial_ner/multimodal_ner/twitter2017_images/16_05_01_360.jpg
    Predicted entities: [{'entity_group': 'PER', 'start': 0, 'end': 9}, {'entity_group': 'PER', 'start': 26, 'end': 38}, {'entity_group': 'PER', 'start': 102, 'end': 113}]
    Word 'Citifield' belongs to group: PER
    Word 'jehnnybgoode' belongs to group: PER
    Word 'NewYorkMets' belongs to group: PER


Reloading and Continuous Training
---------------------------------

The trained predictor is automatically saved and you can easily reload
it using the path. If you are not satisfied with the current model
performance, you can continue training the loaded model with new data.

.. code:: python

    new_predictor = MultiModalPredictor.load(model_path)
    new_model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner_continue_train"
    new_predictor.fit(train_data, time_limit=60, save_path=new_model_path)
    test_score = new_predictor.evaluate(test_data, metrics=['overall_f1'])
    print(test_score)


.. parsed-literal::
    :class: output

    Load pretrained checkpoint: /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/7f6a36d92eec495698878ac82c94587d-automm_multimodal_ner/model.ckpt
    AutoMM starts to create your model. ✨
    
    - Model will be saved to "/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/e63f55a7d0ca4046bfdb1861bf5cf48e-automm_multimodal_ner_continue_train".
    
    - Validation metric is "ner_token_f1".
    
    - To track the learning progress, you can open a terminal and launch Tensorboard:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/e63f55a7d0ca4046bfdb1861bf5cf48e-automm_multimodal_ner_continue_train
        ```
    
    Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai
    
    AutoMM has created your model 🎉🎉🎉
    
    - To load the model, use the code below:
        ```python
        from autogluon.multimodal import MultiModalPredictor
        predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/e63f55a7d0ca4046bfdb1861bf5cf48e-automm_multimodal_ner_continue_train")
        ```
    
    - You can open a terminal and launch Tensorboard to visualize the training log:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/multimodal_prediction/tmp/e63f55a7d0ca4046bfdb1861bf5cf48e-automm_multimodal_ner_continue_train
        ```
    
    - If you are not satisfied with the model, try to increase the training time, 
    adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
    or post issues on GitHub: https://github.com/autogluon/autogluon
    
    
.. parsed-literal::
    :class: output

    {'overall_f1': 0.5686537173476223}


Other Examples
--------------

You may go to `AutoMM
Examples <https://github.com/autogluon/autogluon/tree/master/examples/automm>`__
to explore other examples about AutoMM.

Customization
-------------

To learn how to customize AutoMM, please refer to
:ref:`sec_automm_customization`.