AutoMM Detection - Finetune on COCO Format Dataset with Customized Settings

Open In Colab Open In SageMaker Studio Lab

Pothole Dataset

In this section, our goal is to fast finetune and evaluate a pretrained model on Pothole dataset in COCO format with customized setting. Pothole is a single object, i.e. pothole, detection dataset, containing 665 images with bounding box annotations for the creation of detection models and can work as POC/POV for the maintenance of roads. See AutoMM Detection - Prepare Pothole Dataset for how to prepare Pothole dataset.

To start, make sure mmcv and mmdet are installed. Note: MMDet is no longer actively maintained and is only compatible with MMCV version 2.1.0. Installation can be problematic due to CUDA version compatibility issues. For best results:

  1. Use CUDA 12.4 with PyTorch 2.5

  2. Before installation, run:

    pip install -U pip setuptools wheel
    sudo apt-get install -y ninja-build gcc g++
    

    This will help prevent MMCV installation from hanging during wheel building.

  3. After installation in Jupyter notebook, restart the kernel for changes to take effect.

# Update package tools and install build dependencies
!pip install -U pip setuptools wheel
!sudo apt-get install -y ninja-build gcc g++

# Install MMCV
!python3 -m mim install "mmcv==2.1.0"

# For Google Colab users: If the above fails, use this alternative MMCV installation
# pip install "mmcv==2.1.0" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html

# Install MMDet
!python3 -m pip install "mmdet==3.2.0"

# Install MMEngine (version >=0.10.6 for PyTorch 2.5 compatibility)
!python3 -m pip install "mmengine>=0.10.6"

Hide code cell output

Requirement already satisfied: pip in /home/ci/opt/venv/lib/python3.12/site-packages (26.0.1)
Requirement already satisfied: setuptools in /home/ci/opt/venv/lib/python3.12/site-packages (81.0.0)
Collecting setuptools
  Using cached setuptools-82.0.0-py3-none-any.whl.metadata (6.6 kB)
Requirement already satisfied: wheel in /home/ci/opt/venv/lib/python3.12/site-packages (0.46.3)
Requirement already satisfied: packaging>=24.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from wheel) (26.0)
Using cached setuptools-82.0.0-py3-none-any.whl (1.0 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 81.0.0
    Uninstalling setuptools-81.0.0:
      Successfully uninstalled setuptools-81.0.0
Successfully installed setuptools-82.0.0
/usr/bin/sh: 1: sudo: not found
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 148, in _get_module_details
  File "<frozen runpy>", line 112, in _get_module_details
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/__init__.py", line 16, in <module>
    from .commands import (
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/commands/__init__.py", line 2, in <module>
    from .download import download
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/commands/download.py", line 12, in <module>
    from mim.click import (
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/click/__init__.py", line 2, in <module>
    from .autocompletion import (
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/click/autocompletion.py", line 2, in <module>
    from mim.commands.list import list_package
  File "/home/ci/opt/venv/lib/python3.12/site-packages/mim/commands/list.py", line 4, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
Collecting mmdet==3.2.0
  Downloading mmdet-3.2.0-py3-none-any.whl.metadata (32 kB)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (3.10.8)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.1.3)
Requirement already satisfied: pycocotools in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.0.11)
Requirement already satisfied: scipy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (1.16.3)
Requirement already satisfied: shapely in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.1.2)
Requirement already satisfied: six in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (1.17.0)
Requirement already satisfied: terminaltables in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (3.1.10)
Requirement already satisfied: tqdm in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (4.67.3)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (4.61.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (26.0)
Requirement already satisfied: pillow>=8 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (11.3.0)
Requirement already satisfied: pyparsing>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (3.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (2.9.0.post0)
Downloading mmdet-3.2.0-py3-none-any.whl (2.1 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.1 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 64.6 MB/s  0:00:00
?25h
Installing collected packages: mmdet
  Attempting uninstall: mmdet
    Found existing installation: mmdet 3.3.0
    Uninstalling mmdet-3.3.0:
      Successfully uninstalled mmdet-3.3.0
Successfully installed mmdet-3.2.0
Requirement already satisfied: mmengine>=0.10.6 in /home/ci/opt/venv/lib/python3.12/site-packages (0.10.7)
Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (2.4.0)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (3.10.8)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (2.1.3)
Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (6.0.3)
Requirement already satisfied: rich in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (14.3.2)
Requirement already satisfied: termcolor in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (3.3.0)
Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (0.43.0)
Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (4.13.0.92)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (4.61.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (26.0)
Requirement already satisfied: pillow>=8 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (11.3.0)
Requirement already satisfied: pyparsing>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (3.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /home/ci/opt/venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib->mmengine>=0.10.6) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.10.6) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.10.6) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.10.6) (0.1.2)
Requirement already satisfied: platformdirs>=3.5.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from yapf->mmengine>=0.10.6) (4.9.2)
from autogluon.multimodal import MultiModalPredictor
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[3], line 1
----> 1 from autogluon.multimodal import MultiModalPredictor

File ~/autogluon/multimodal/src/autogluon/multimodal/__init__.py:8
      5 except ImportError:
      6     pass
----> 8 from .predictor import MultiModalPredictor
     10 _add_stream_handler()

File ~/autogluon/multimodal/src/autogluon/multimodal/predictor.py:19
     16 from autogluon.core.metrics import Scorer
     18 from .constants import AUTOMM_TUTORIAL_MODE, FEW_SHOT_CLASSIFICATION, NER, OBJECT_DETECTION, SEMANTIC_SEGMENTATION
---> 19 from .learners import (
     20     BaseLearner,
     21     EnsembleLearner,
     22     FewShotSVMLearner,
     23     MatchingLearner,
     24     NERLearner,
     25     ObjectDetectionLearner,
     26     SemanticSegmentationLearner,
     27 )
     28 from .utils import get_dir_ckpt_paths
     29 from .utils.problem_types import PROBLEM_TYPES_REG

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/__init__.py:1
----> 1 from .base import BaseLearner
      2 from .ensemble import EnsembleLearner
      3 from .few_shot_svm import FewShotSVMLearner

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:71
     31 from .. import version as ag_version
     32 from ..constants import (
     33     BEST,
     34     BEST_K_MODELS_FILE,
   (...)
     69     ZERO_SHOT_IMAGE_CLASSIFICATION,
     70 )
---> 71 from ..data import (
     72     BaseDataModule,
     73     MultiModalFeaturePreprocessor,
     74     create_fusion_data_processors,
     75     data_to_df,
     76     get_mixup,
     77     infer_column_types,
     78     infer_dtypes_by_model_names,
     79     infer_output_shape,
     80     infer_problem_type,
     81     infer_scarcity_mode_by_data_size,
     82     init_df_preprocessor,
     83     is_image_column,
     84     split_train_tuning_data,
     85     turn_on_off_feature_column_info,
     86 )
     87 from ..models import (
     88     create_fusion_model,
     89     get_model_postprocess_fn,
   (...)
     92     select_model,
     93 )
     94 from ..optim import (
     95     compute_score,
     96     get_aug_loss_func,
   (...)
    103     infer_metrics,
    104 )

File ~/autogluon/multimodal/src/autogluon/multimodal/data/__init__.py:1
----> 1 from .datamodule import BaseDataModule
      2 from .dataset import BaseDataset
      3 from .dataset_mmlab import MultiImageMixDataset

File ~/autogluon/multimodal/src/autogluon/multimodal/data/datamodule.py:8
      5 from torch.utils.data import DataLoader, Dataset
      7 from ..constants import PREDICT, TEST, TRAIN, VALIDATE
----> 8 from .dataset import BaseDataset
      9 from .preprocess_dataframe import MultiModalFeaturePreprocessor
     10 from .utils import get_collate_fn

File ~/autogluon/multimodal/src/autogluon/multimodal/data/dataset.py:9
      7 from ..constants import GET_ITEM_ERROR_RETRY
      8 from .preprocess_dataframe import MultiModalFeaturePreprocessor
----> 9 from .utils import apply_data_processor, apply_df_preprocessor, get_per_sample_features
     11 logger = logging.getLogger(__name__)
     14 class BaseDataset(torch.utils.data.Dataset):

File ~/autogluon/multimodal/src/autogluon/multimodal/data/utils.py:48
     46 from .process_numerical import NumericalProcessor
     47 from .process_semantic_seg_img import SemanticSegImageProcessor
---> 48 from .process_text import TextProcessor
     50 logger = logging.getLogger(__name__)
     53 def get_collate_fn(
     54     df_preprocessor: Union[MultiModalFeaturePreprocessor, List[MultiModalFeaturePreprocessor]],
     55     data_processors: Union[Dict, List[Dict]],
     56     per_gpu_batch_size: Optional[int] = None,
     57 ):

File ~/autogluon/multimodal/src/autogluon/multimodal/data/process_text.py:19
     17 from ..models.utils import get_pretrained_tokenizer
     18 from .collator import PadCollator, StackCollator
---> 19 from .template_engine import TemplateEngine
     20 from .trivial_augmenter import TrivialAugment
     22 logger = logging.getLogger(__name__)

File ~/autogluon/multimodal/src/autogluon/multimodal/data/template_engine.py:6
      3 import numpy as np
      4 from omegaconf import DictConfig, OmegaConf
----> 6 from .templates import DatasetTemplates, Template, TemplateCollection
      8 logger = logging.getLogger(__name__)
     11 class TemplateEngine:

File ~/autogluon/multimodal/src/autogluon/multimodal/data/templates.py:16
     13 from typing import Dict, List, Optional, Tuple
     15 import pandas as pd
---> 16 import pkg_resources
     17 import yaml
     18 from jinja2 import BaseLoader, Environment, meta

ModuleNotFoundError: No module named 'pkg_resources'

And also import some other packages that will be used in this tutorial:

import os

from autogluon.core.utils.loaders import load_zip

We have the sample dataset ready in the cloud. Let’s download it and store the paths for each data split:

zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip"
download_dir = "./pothole"

load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "pothole")
train_path = os.path.join(data_dir, "Annotations", "usersplit_train_cocoformat.json")
val_path = os.path.join(data_dir, "Annotations", "usersplit_val_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "usersplit_test_cocoformat.json")

While using COCO format dataset, the input is the json annotation file of the dataset split. In this example, usersplit_train_cocoformat.json is the annotation file of the train split. usersplit_val_cocoformat.json is the annotation file of the validation split. And usersplit_test_cocoformat.json is the annotation file of the test split.

We select the YOLOX-small model pretrained on COCO dataset. With this setting, it is fast to finetune or inference, and easy to deploy. Note that you can use a larger model by setting the checkpoint_name to corresponding checkpoint name for better performance (but usually with slower speed). And you may need to change the lr and per_gpu_batch_size for a different model. An easier way is to use our predefined presets "medium_quality", "high_quality", or "best_quality". For more about using presets, see Quick Start Coco.

checkpoint_name = "yolox_s"
num_gpus = 1  # only use one GPU

We create the MultiModalPredictor with selected checkpoint name and number of GPUs. We need to specify the problem_type to "object_detection", and also provide a sample_data_path for the predictor to infer the categories of the dataset. Here we provide the train_path, and it also works using any other split of this dataset.

predictor = MultiModalPredictor(
    hyperparameters={
        "model.mmdet_image.checkpoint_name": checkpoint_name,
        "env.num_gpus": num_gpus,
    },
    problem_type="object_detection",
    sample_data_path=train_path,
)

We set the learning rate to be 1e-4. Note that we use a two-stage learning rate option during finetuning by default, and the model head will have 100x learning rate. Using a two-stage learning rate with high learning rate only on head layers makes the model converge faster during finetuning. It usually gives better performance as well, especially on small datasets with hundreds or thousands of images. We set batch size to be 16, and you can increase or decrease the batch size based on your available GPU memory. We set max number of epochs to 30, number of validation check per interval to 1.0, and validation check per n epochs to 3 for fast finetuning. We also compute the time of the fit process here for better understanding the speed.

predictor.fit(
    train_path,
    tuning_data=val_path,
    hyperparameters={
        "optim.lr": 1e-4,  # we use two stage and detection head has 100x lr
        "env.per_gpu_batch_size": 16,  # decrease it when model is large or GPU memory is small
        "optim.max_epochs": 30,  # max number of training epochs, note that we may early stop before this based on validation setting
        "optim.val_check_interval": 1.0,  # Do 1 validation each epoch
        "optim.check_val_every_n_epoch": 3,  # Do 1 validation each 3 epochs
        "optim.patience": 3,  # Early stop after 3 consective validations are not the best
    },
)

To evaluate the model we just trained, run:

predictor.evaluate(test_path)

Note that it’s always recommended to use our predefined presets to save customization time with following code script:

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets="medium_quality",
)
predictor.fit(train_path, tuning_data=val_path)
predictor.evaluate(test_path)

For more about using presets, see Quick Start Coco.

And the evaluation results are shown in command line output. The first value is mAP in COCO standard, and the second one is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline.

We can get the prediction on test set:

pred = predictor.predict(test_path)

Let’s also visualize the prediction result:

!pip install opencv-python
from autogluon.multimodal.utils import visualize_detection
conf_threshold = 0.25  # Specify a confidence threshold to filter out unwanted boxes
visualization_result_dir = "./"  # Use the pwd as result dir to save the visualized image
visualized = visualize_detection(
    pred=pred[12:13],
    detection_classes=predictor.classes,
    conf_threshold=conf_threshold,
    visualization_result_dir=visualization_result_dir,
)
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')
display(img)

Under this fast finetune setting, we reached a good mAP number on a new dataset with a few hundred seconds! For how to finetune with higher performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset, where we finetuned a VFNet model with 5 hours and reached mAP = 0.450, mAP50 = 0.718 on this dataset.

Other Examples

You may go to AutoMM Examples to explore other examples about AutoMM.

Customization

To learn how to customize AutoMM, please refer to Customize AutoMM.

Citation

@article{DBLP:journals/corr/abs-2107-08430,
  author    = {Zheng Ge and
               Songtao Liu and
               Feng Wang and
               Zeming Li and
               Jian Sun},
  title     = {{YOLOX:} Exceeding {YOLO} Series in 2021},
  journal   = {CoRR},
  volume    = {abs/2107.08430},
  year      = {2021},
  url       = {https://arxiv.org/abs/2107.08430},
  eprinttype = {arXiv},
  eprint    = {2107.08430},
  timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}