.. _sec_automm_imageclassification_beginner: AutoMM for Image Classification - Quick Start ============================================= In this quick start, we’ll use the task of image classification to illustrate how to use **MultiModalPredictor**. Once the data is prepared in `Pandas DataFrame `__ format, a single call to ``MultiModalPredictor.fit()`` will take care of the model training for you. Create Image Dataset -------------------- For demonstration purposes, we use a subset of the `Shopee-IET dataset `__ from Kaggle. Each image in this data depicts a clothing item and the corresponding label specifies its clothing category. Our subset of the data contains the following possible labels: ``BabyPants``, ``BabyShirt``, ``womencasualshoes``, ``womenchiffontop``. We can load a dataset by downloading a url data automatically: .. code:: python import warnings warnings.filterwarnings('ignore') from autogluon.vision import ImageDataset train_dataset, _, test_dataset = ImageDataset.from_folders("https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip") print(train_dataset) .. parsed-literal:: :class: output Downloading /home/ci/.gluoncv/archive/shopee-iet.zip from https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip... .. parsed-literal:: :class: output 100%|██████████| 40895/40895 [00:01<00:00, 24350.69KB/s] .. parsed-literal:: :class: output data/ ├── test/ └── train/ image label 0 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 0 1 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 0 2 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 0 3 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 0 4 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 0 .. ... ... 795 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 3 796 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 3 797 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 3 798 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 3 799 /home/ci/.gluoncv/datasets/shopee-iet/data/tra... 3 [800 rows x 2 columns] We can see there are 800 rows and 2 columns in this training dataframe. The 2 columns are **image** and **label**, and each row represents a different training sample. Use AutoMM to Fit Models ------------------------ Now, we fit a classifier using AutoMM as follows: .. code:: python from autogluon.multimodal import MultiModalPredictor predictor = MultiModalPredictor(label="label", path="./automm_imgcls") predictor.fit( train_data=train_dataset, time_limit=30, # seconds ) # you can trust the default config, e.g., we use a `swin_base_patch4_window7_224` model .. parsed-literal:: :class: output Global seed set to 123 Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22kto1k.pth" to /home/ci/.cache/torch/hub/checkpoints/swin_base_patch4_window7_224_22kto1k.pth Auto select gpus: [0] Using 16bit native Automatic Mixed Precision (AMP) GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ---------------------------------------------------------------------- 0 | model | TimmAutoModelForImagePrediction | 86.7 M 1 | validation_metric | Accuracy | 0 2 | loss_func | CrossEntropyLoss | 0 ---------------------------------------------------------------------- 86.7 M Trainable params 0 Non-trainable params 86.7 M Total params 173.495 Total estimated model params size (MB) Epoch 0, global step 2: 'val_accuracy' reached 0.33125 (best 0.33125), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=0-step=2.ckpt' as top 3 Epoch 0, global step 5: 'val_accuracy' reached 0.88750 (best 0.88750), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=0-step=5.ckpt' as top 3 Epoch 1, global step 7: 'val_accuracy' reached 0.92500 (best 0.92500), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/automm_imgcls/epoch=1-step=7.ckpt' as top 3 Time limit reached. Elapsed time is 0:00:36. Signaling Trainer to stop. .. parsed-literal:: :class: output **label** is the name of the column that contains the target variable to predict, e.g., it is “label” in our example. **path** indicates the directory where models and intermediate outputs should be saved. We set the training time limit to 30 seconds for demonstration purpose, but you can control the training time by setting configurations. To customize AutoMM, please refer to :ref:`sec_automm_customization`. Evaluate on Test Dataset ------------------------ You can evaluate the classifier on the test dataset to see how it performs, the test top-1 accuracy is: .. code:: python scores = predictor.evaluate(test_dataset, metrics=["accuracy"]) print('Top-1 test acc: %.3f' % scores["accuracy"]) .. parsed-literal:: :class: output Top-1 test acc: 0.963 Predict on a New Image ---------------------- Given an example image, let’s visualize it first, .. code:: python image_path = test_dataset.iloc[0]['image'] from IPython.display import Image, display pil_img = Image(filename=image_path) display(pil_img) .. figure:: output_beginner_image_cls_96f3fd_7_0.jpg We can easily use the final model to ``predict`` the label, .. code:: python predictions = predictor.predict({'image': [image_path]}) print(predictions) .. parsed-literal:: :class: output [0] If probabilities of all categories are needed, you can call ``predict_proba``: .. code:: python proba = predictor.predict_proba({'image': [image_path]}) print(proba) .. parsed-literal:: :class: output [[0.812597 0.16638389 0.00555061 0.01546856]] Extract Embeddings ------------------ Extracting representation from the whole image learned by a model is also very useful. We provide ``extract_embedding`` function to allow predictor to return the N-dimensional image feature where ``N`` depends on the model(usually a 512 to 2048 length vector) .. code:: python feature = predictor.extract_embedding({'image': [image_path]}) print(feature[0].shape) .. parsed-literal:: :class: output (1024,) Save and Load ------------- The trained predictor is automatically saved at the end of ``fit()``, and you can easily reload it. .. code:: python loaded_predictor = MultiModalPredictor.load('automm_imgcls') load_proba = loaded_predictor.predict_proba({'image': [image_path]}) print(load_proba) .. parsed-literal:: :class: output [[0.812597 0.16638389 0.00555061 0.01546856]] We can see the predicted class probabilities are still the same as above, which means same model! Other Examples -------------- You may go to `AutoMM Examples `__ to explore other examples about AutoMM. Customization ------------- To learn how to customize AutoMM, please refer to :ref:`sec_automm_customization`.