AutoMM Detection - Finetune on COCO Format Dataset with Customized Settings

Open In Colab Open In SageMaker Studio Lab

Pothole Dataset

In this section, our goal is to fast finetune and evaluate a pretrained model on Pothole dataset in COCO format with customized setting. Pothole is a single object, i.e. pothole, detection dataset, containing 665 images with bounding box annotations for the creation of detection models and can work as POC/POV for the maintenance of roads. See AutoMM Detection - Prepare Pothole Dataset for how to prepare Pothole dataset.

To start, make sure mmcv and mmdet are installed. Note: MMDet is no longer actively maintained and is only compatible with MMCV version 2.1.0. Installation can be problematic due to CUDA version compatibility issues. For best results:

  1. Use CUDA 12.4 with PyTorch 2.5

  2. Before installation, run:

    pip install -U pip setuptools wheel
    sudo apt-get install -y ninja-build gcc g++
    

    This will help prevent MMCV installation from hanging during wheel building.

  3. After installation in Jupyter notebook, restart the kernel for changes to take effect.

# Update package tools and install build dependencies
!pip install -U pip setuptools wheel
!sudo apt-get install -y ninja-build gcc g++

# Install MMCV
!python3 -m mim install "mmcv==2.1.0"

# For Google Colab users: If the above fails, use this alternative MMCV installation
# pip install "mmcv==2.1.0" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html

# Install MMDet
!python3 -m pip install "mmdet==3.2.0"

# Install MMEngine (version >=0.10.6 for PyTorch 2.5 compatibility)
!python3 -m pip install "mmengine>=0.10.6"

Hide code cell output

Requirement already satisfied: pip in /home/ci/opt/venv/lib/python3.12/site-packages (25.3)
Requirement already satisfied: setuptools in /home/ci/opt/venv/lib/python3.12/site-packages (80.9.0)
Requirement already satisfied: wheel in /home/ci/opt/venv/lib/python3.12/site-packages (0.45.1)
/usr/bin/sh: 1: sudo: not found
/home/ci/opt/venv/lib/python3.12/site-packages/mim/commands/list.py:4: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
Looking in links: https://download.openmmlab.com/mmcv/dist/cu126/torch2.7.0/index.html
Requirement already satisfied: mmcv==2.1.0 in /home/ci/opt/venv/lib/python3.12/site-packages (2.1.0)
Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (2.4.0)
Requirement already satisfied: mmengine>=0.3.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (0.10.5)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (2.1.3)
Requirement already satisfied: packaging in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (25.0)
Requirement already satisfied: Pillow in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (11.3.0)
Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (6.0.3)
Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (0.43.0)
Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from mmcv==2.1.0) (4.12.0.88)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (3.10.7)
Requirement already satisfied: rich in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (14.2.0)
Requirement already satisfied: termcolor in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.3.0->mmcv==2.1.0) (3.2.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (4.60.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.4.9)
Requirement already satisfied: pyparsing>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (3.2.5)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /home/ci/opt/venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib->mmengine>=0.3.0->mmcv==2.1.0) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.3.0->mmcv==2.1.0) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.3.0->mmcv==2.1.0) (0.1.2)
Requirement already satisfied: platformdirs>=3.5.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from yapf->mmcv==2.1.0) (4.5.0)
Requirement already satisfied: mmdet==3.2.0 in /home/ci/opt/venv/lib/python3.12/site-packages (3.2.0)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (3.10.7)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.1.3)
Requirement already satisfied: pycocotools in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.0.10)
Requirement already satisfied: scipy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (1.16.3)
Requirement already satisfied: shapely in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (2.1.2)
Requirement already satisfied: six in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (1.17.0)
Requirement already satisfied: terminaltables in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (3.1.10)
Requirement already satisfied: tqdm in /home/ci/opt/venv/lib/python3.12/site-packages (from mmdet==3.2.0) (4.67.1)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (4.60.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (25.0)
Requirement already satisfied: pillow>=8 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (11.3.0)
Requirement already satisfied: pyparsing>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (3.2.5)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmdet==3.2.0) (2.9.0.post0)
Collecting mmengine>=0.10.6
  Downloading mmengine-0.10.7-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (2.4.0)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (3.10.7)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (2.1.3)
Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (6.0.3)
Requirement already satisfied: rich in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (14.2.0)
Requirement already satisfied: termcolor in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (3.2.0)
Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (0.43.0)
Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from mmengine>=0.10.6) (4.12.0.88)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (4.60.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (25.0)
Requirement already satisfied: pillow>=8 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (11.3.0)
Requirement already satisfied: pyparsing>=3 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (3.2.5)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.12/site-packages (from matplotlib->mmengine>=0.10.6) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /home/ci/opt/venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib->mmengine>=0.10.6) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.10.6) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/ci/opt/venv/lib/python3.12/site-packages (from rich->mmengine>=0.10.6) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.10.6) (0.1.2)
Requirement already satisfied: platformdirs>=3.5.1 in /home/ci/opt/venv/lib/python3.12/site-packages (from yapf->mmengine>=0.10.6) (4.5.0)
Downloading mmengine-0.10.7-py3-none-any.whl (452 kB)
Installing collected packages: mmengine
  Attempting uninstall: mmengine
    Found existing installation: mmengine 0.10.5
    Uninstalling mmengine-0.10.5:
      Successfully uninstalled mmengine-0.10.5
Successfully installed mmengine-0.10.7
from autogluon.multimodal import MultiModalPredictor
/home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

And also import some other packages that will be used in this tutorial:

import os

from autogluon.core.utils.loaders import load_zip

We have the sample dataset ready in the cloud. Let’s download it and store the paths for each data split:

zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip"
download_dir = "./pothole"

load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "pothole")
train_path = os.path.join(data_dir, "Annotations", "usersplit_train_cocoformat.json")
val_path = os.path.join(data_dir, "Annotations", "usersplit_val_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "usersplit_test_cocoformat.json")
Downloading ./pothole/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip...
  0%|          | 0.00/351M [00:00<?, ?iB/s]
  3%|▎         | 9.10M/351M [00:00<00:03, 91.0MiB/s]
  5%|▌         | 18.8M/351M [00:00<00:03, 94.6MiB/s]
  8%|▊         | 29.3M/351M [00:00<00:03, 99.2MiB/s]
 11%|█         | 39.2M/351M [00:00<00:03, 87.4MiB/s]
 14%|█▎        | 48.1M/351M [00:00<00:03, 79.2MiB/s]
 16%|█▌        | 56.2M/351M [00:00<00:04, 61.4MiB/s]
 19%|█▉        | 67.2M/351M [00:00<00:03, 73.3MiB/s]
 21%|██▏       | 75.3M/351M [00:00<00:03, 72.8MiB/s]
 24%|██▍       | 85.9M/351M [00:01<00:03, 77.1MiB/s]
 28%|██▊       | 98.1M/351M [00:01<00:02, 88.9MiB/s]
 31%|███       | 108M/351M [00:01<00:02, 90.4MiB/s]
 34%|███▎      | 118M/351M [00:01<00:02, 95.4MiB/s]
 37%|███▋      | 128M/351M [00:01<00:02, 92.5MiB/s]
 39%|███▉      | 138M/351M [00:01<00:03, 65.5MiB/s]
 42%|████▏     | 148M/351M [00:01<00:02, 73.3MiB/s]
 45%|████▍     | 156M/351M [00:01<00:02, 76.8MiB/s]
 47%|████▋     | 166M/351M [00:02<00:02, 81.7MiB/s]
 50%|█████     | 176M/351M [00:02<00:02, 86.5MiB/s]
 53%|█████▎    | 185M/351M [00:02<00:01, 84.6MiB/s]
 55%|█████▌    | 194M/351M [00:02<00:03, 47.0MiB/s]
 58%|█████▊    | 204M/351M [00:02<00:02, 57.0MiB/s]
 61%|██████▏   | 216M/351M [00:02<00:01, 68.0MiB/s]
 64%|██████▍   | 224M/351M [00:02<00:01, 71.7MiB/s]
 67%|██████▋   | 235M/351M [00:03<00:01, 80.5MiB/s]
 70%|██████▉   | 245M/351M [00:03<00:01, 78.0MiB/s]
 73%|███████▎  | 255M/351M [00:03<00:01, 85.5MiB/s]
 76%|███████▌  | 266M/351M [00:03<00:00, 91.8MiB/s]
 79%|███████▊  | 276M/351M [00:03<00:00, 86.2MiB/s]
 82%|████████▏ | 286M/351M [00:03<00:00, 89.6MiB/s]
 84%|████████▍ | 295M/351M [00:03<00:00, 74.8MiB/s]
 88%|████████▊ | 307M/351M [00:03<00:00, 84.9MiB/s]
 90%|█████████ | 316M/351M [00:04<00:00, 68.6MiB/s]
 93%|█████████▎| 326M/351M [00:04<00:00, 72.9MiB/s]
 96%|█████████▋| 338M/351M [00:04<00:00, 83.7MiB/s]
 99%|█████████▉| 347M/351M [00:04<00:00, 69.3MiB/s]
100%|██████████| 351M/351M [00:04<00:00, 76.8MiB/s]

While using COCO format dataset, the input is the json annotation file of the dataset split. In this example, usersplit_train_cocoformat.json is the annotation file of the train split. usersplit_val_cocoformat.json is the annotation file of the validation split. And usersplit_test_cocoformat.json is the annotation file of the test split.

We select the YOLOX-small model pretrained on COCO dataset. With this setting, it is fast to finetune or inference, and easy to deploy. Note that you can use a larger model by setting the checkpoint_name to corresponding checkpoint name for better performance (but usually with slower speed). And you may need to change the lr and per_gpu_batch_size for a different model. An easier way is to use our predefined presets "medium_quality", "high_quality", or "best_quality". For more about using presets, see Quick Start Coco.

checkpoint_name = "yolox_s"
num_gpus = 1  # only use one GPU

We create the MultiModalPredictor with selected checkpoint name and number of GPUs. We need to specify the problem_type to "object_detection", and also provide a sample_data_path for the predictor to infer the categories of the dataset. Here we provide the train_path, and it also works using any other split of this dataset.

predictor = MultiModalPredictor(
    hyperparameters={
        "model.mmdet_image.checkpoint_name": checkpoint_name,
        "env.num_gpus": num_gpus,
    },
    problem_type="object_detection",
    sample_data_path=train_path,
)

We set the learning rate to be 1e-4. Note that we use a two-stage learning rate option during finetuning by default, and the model head will have 100x learning rate. Using a two-stage learning rate with high learning rate only on head layers makes the model converge faster during finetuning. It usually gives better performance as well, especially on small datasets with hundreds or thousands of images. We set batch size to be 16, and you can increase or decrease the batch size based on your available GPU memory. We set max number of epochs to 30, number of validation check per interval to 1.0, and validation check per n epochs to 3 for fast finetuning. We also compute the time of the fit process here for better understanding the speed.

predictor.fit(
    train_path,
    tuning_data=val_path,
    hyperparameters={
        "optim.lr": 1e-4,  # we use two stage and detection head has 100x lr
        "env.per_gpu_batch_size": 16,  # decrease it when model is large or GPU memory is small
        "optim.max_epochs": 30,  # max number of training epochs, note that we may early stop before this based on validation setting
        "optim.val_check_interval": 1.0,  # Do 1 validation each epoch
        "optim.check_val_every_n_epoch": 3,  # Do 1 validation each 3 epochs
        "optim.patience": 3,  # Early stop after 3 consective validations are not the best
    },
)
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Downloading yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth from https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth...
=================== System Info ===================
AutoGluon Version:  1.4.1b20251117
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Memory:         GPU 0: 14.57/14.57 GB
Total GPU Memory:   Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count:          1
Memory Avail:       28.40 GB / 30.95 GB (91.8%)
===================================================
No path specified. Models will be saved in: "AutogluonModels/ag-20251117_080610"
Using default root folder: ./pothole/pothole/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
Using default root folder: ./pothole/pothole/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/object_detection/advanced/AutogluonModels/ag-20251117_080610
    ```
Seed set to 0
  0%|          | 0.00/36.1M [00:00<?, ?iB/s]
  1%|          | 267k/36.1M [00:00<00:13, 2.62MiB/s]
  2%|▏         | 675k/36.1M [00:00<00:10, 3.47MiB/s]
  3%|▎         | 1.22M/36.1M [00:00<00:08, 4.34MiB/s]
  5%|▌         | 1.92M/36.1M [00:00<00:06, 5.39MiB/s]
  8%|▊         | 2.85M/36.1M [00:00<00:04, 6.80MiB/s]
 11%|█         | 4.05M/36.1M [00:00<00:03, 8.46MiB/s]
 16%|█▌        | 5.64M/36.1M [00:00<00:02, 10.8MiB/s]
 21%|██        | 7.65M/36.1M [00:00<00:02, 13.6MiB/s]
 27%|██▋       | 9.75M/36.1M [00:00<00:01, 15.8MiB/s]
 31%|███▏      | 11.3M/36.1M [00:01<00:01, 13.0MiB/s]
 35%|███▌      | 12.7M/36.1M [00:01<00:01, 13.0MiB/s]
 41%|████      | 14.9M/36.1M [00:01<00:01, 15.3MiB/s]
 46%|████▌     | 16.5M/36.1M [00:01<00:01, 14.9MiB/s]
 52%|█████▏    | 18.6M/36.1M [00:01<00:01, 16.7MiB/s]
 58%|█████▊    | 20.9M/36.1M [00:01<00:00, 18.2MiB/s]
 64%|██████▍   | 23.1M/36.1M [00:01<00:00, 19.3MiB/s]
 70%|██████▉   | 25.2M/36.1M [00:01<00:00, 19.9MiB/s]
 76%|███████▌  | 27.4M/36.1M [00:01<00:00, 20.4MiB/s]
 82%|████████▏ | 29.7M/36.1M [00:02<00:00, 21.0MiB/s]
 88%|████████▊ | 31.9M/36.1M [00:02<00:00, 21.2MiB/s]
 94%|█████████▍| 34.1M/36.1M [00:02<00:00, 21.3MiB/s]
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 predictor.fit(
      2     train_path,
      3     tuning_data=val_path,
      4     hyperparameters={
      5         "optim.lr": 1e-4,  # we use two stage and detection head has 100x lr
      6         "env.per_gpu_batch_size": 16,  # decrease it when model is large or GPU memory is small
      7         "optim.max_epochs": 30,  # max number of training epochs, note that we may early stop before this based on validation setting
      8         "optim.val_check_interval": 1.0,  # Do 1 validation each epoch
      9         "optim.check_val_every_n_epoch": 3,  # Do 1 validation each 3 epochs
     10         "optim.patience": 3,  # Early stop after 3 consective validations are not the best
     11     },
     12 )

File ~/autogluon/multimodal/src/autogluon/multimodal/predictor.py:540, in MultiModalPredictor.fit(self, train_data, presets, tuning_data, max_num_tuning_data, id_mappings, time_limit, save_path, hyperparameters, column_types, holdout_frac, teacher_predictor, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, predictions, labels, predictors)
    537     assert isinstance(predictors, list)
    538     learners = [ele if isinstance(ele, str) else ele._learner for ele in predictors]
--> 540 self._learner.fit(
    541     train_data=train_data,
    542     presets=presets,
    543     tuning_data=tuning_data,
    544     max_num_tuning_data=max_num_tuning_data,
    545     time_limit=time_limit,
    546     save_path=save_path,
    547     hyperparameters=hyperparameters,
    548     column_types=column_types,
    549     holdout_frac=holdout_frac,
    550     teacher_learner=teacher_learner,
    551     seed=seed,
    552     standalone=standalone,
    553     hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
    554     clean_ckpts=clean_ckpts,
    555     id_mappings=id_mappings,
    556     predictions=predictions,
    557     labels=labels,
    558     learners=learners,
    559 )
    561 return self

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:243, in ObjectDetectionLearner.fit(self, train_data, presets, tuning_data, max_num_tuning_data, time_limit, save_path, hyperparameters, column_types, holdout_frac, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, **kwargs)
    236 self.fit_sanity_check()
    237 self.prepare_fit_args(
    238     time_limit=time_limit,
    239     seed=seed,
    240     standalone=standalone,
    241     clean_ckpts=clean_ckpts,
    242 )
--> 243 fit_returns = self.execute_fit()
    244 self.on_fit_end(
    245     training_start=training_start,
    246     strategy=fit_returns.get("strategy", None),
   (...)
    249     clean_ckpts=clean_ckpts,
    250 )
    252 return self

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:577, in BaseLearner.execute_fit(self)
    575     return dict()
    576 else:
--> 577     attributes = self.fit_per_run(**self._fit_args)
    578     self.update_attributes(**attributes)  # only update attributes for non-HPO mode
    579     return attributes

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:380, in ObjectDetectionLearner.fit_per_run(self, max_time, save_path, ckpt_path, resume, enable_progress_bar, seed, hyperparameters, advanced_hyperparameters, config, df_preprocessor, data_processors, model, standalone, clean_ckpts)
    375 df_preprocessor = self.get_df_preprocessor_per_run(
    376     df_preprocessor=df_preprocessor,
    377     config=config,
    378 )
    379 config = self.update_config_by_data_per_run(config=config, df_preprocessor=df_preprocessor)
--> 380 model = self.get_model_per_run(model=model, config=config)
    381 model = self.compile_model_per_run(config=config, model=model)
    382 data_processors = self.get_data_processors_per_run(
    383     data_processors=data_processors,
    384     config=config,
    385     model=model,
    386     advanced_hyperparameters=advanced_hyperparameters,
    387 )

File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:349, in ObjectDetectionLearner.get_model_per_run(self, model, config)
    347 def get_model_per_run(self, model, config):
    348     if model is None:
--> 349         model = create_fusion_model(
    350             config=config,
    351             num_classes=self._output_shape,
    352             classes=self._classes,
    353         )
    354     return model

File ~/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1624, in create_fusion_model(config, num_classes, classes, num_numerical_columns, num_categories, numerical_fill_values, pretrained)
   1622 for model_name in names:
   1623     model_config = getattr(config.model, model_name)
-> 1624     model = create_model(
   1625         model_name=model_name,
   1626         model_config=model_config,
   1627         num_classes=num_classes,
   1628         classes=classes,
   1629         num_numerical_columns=num_numerical_columns,
   1630         num_categories=num_categories,
   1631         numerical_fill_values=numerical_fill_values,
   1632         pretrained=pretrained,
   1633     )
   1635     if isinstance(model, functools.partial):  # fusion model
   1636         if fusion_model is None:

File ~/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1421, in create_model(model_name, model_config, num_classes, classes, num_numerical_columns, num_categories, numerical_fill_values, pretrained, is_matching)
   1418 elif model_name.lower().startswith(MMDET_IMAGE):
   1419     from .mmdet_image import MMDetAutoModelForObjectDetection
-> 1421     model = MMDetAutoModelForObjectDetection(
   1422         prefix=model_name,
   1423         checkpoint_name=model_config.checkpoint_name,
   1424         config_file=model_config.config_file,
   1425         classes=classes,
   1426         pretrained=pretrained,
   1427         output_bbox_format=model_config.output_bbox_format,
   1428         frozen_layers=model_config.frozen_layers,
   1429     )
   1430 elif model_name.lower().startswith(MMOCR_TEXT_DET):
   1431     from .mmocr_text_detection import MMOCRAutoModelForTextDetection

File ~/autogluon/multimodal/src/autogluon/multimodal/models/mmdet_image.py:91, in MMDetAutoModelForObjectDetection.__init__(self, prefix, checkpoint_name, config_file, classes, pretrained, output_bbox_format, frozen_layers)
     88 self._load_config()
     90 self._update_classes(classes)
---> 91 self._load_checkpoint(self.checkpoint_file)
     93 freeze_model_layers(self.model, self.frozen_layers)

File ~/autogluon/multimodal/src/autogluon/multimodal/models/mmdet_image.py:121, in MMDetAutoModelForObjectDetection._load_checkpoint(self, checkpoint_file)
    117 def _load_checkpoint(self, checkpoint_file):
    118     # build model and load pretrained weights
    119     from mmdet.utils import register_all_modules
--> 121     register_all_modules()  # https://github.com/open-mmlab/mmdetection/issues/9719
    123     self.model = MODELS.build(self.config.model)
    124     # yolox use self.config.model.data_preprocessor, yolov3 use self.config.data_preprocessor

File ~/opt/venv/lib/python3.12/site-packages/mmdet/utils/setup_env.py:97, in register_all_modules(init_default_scope)
     86 def register_all_modules(init_default_scope: bool = True) -> None:
     87     """Register all modules in mmdet into the registries.
     88 
     89     Args:
   (...)
     95             Defaults to True.
     96     """  # noqa
---> 97     import mmdet.datasets  # noqa: F401,F403
     98     import mmdet.engine  # noqa: F401,F403
     99     import mmdet.evaluation  # noqa: F401,F403

File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/__init__.py:26
     22 from .reid_dataset import ReIDDataset
     23 from .samplers import (AspectRatioBatchSampler, ClassAwareSampler,
     24                        GroupMultiSourceSampler, MultiSourceSampler,
     25                        TrackAspectRatioBatchSampler, TrackImgSampler)
---> 26 from .utils import get_loading_pipeline
     27 from .v3det import V3DetDataset
     28 from .voc import VOCDataset

File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/utils.py:5
      1 # Copyright (c) OpenMMLab. All rights reserved.
      3 from mmcv.transforms import LoadImageFromFile
----> 5 from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
      6 from mmdet.registry import TRANSFORMS
      9 def get_loading_pipeline(pipeline):

File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/transforms/__init__.py:6
      2 from .augment_wrappers import AutoAugment, RandAugment
      3 from .colorspace import (AutoContrast, Brightness, Color, ColorTransform,
      4                          Contrast, Equalize, Invert, Posterize, Sharpness,
      5                          Solarize, SolarizeAdd)
----> 6 from .formatting import (ImageToTensor, PackDetInputs, PackReIDInputs,
      7                          PackTrackInputs, ToTensor, Transpose)
      8 from .frame_sampling import BaseFrameSample, UniformRefFrameSample
      9 from .geometric import (GeomTransform, Rotate, ShearX, ShearY, TranslateX,
     10                         TranslateY)

File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/transforms/formatting.py:11
      9 from mmdet.registry import TRANSFORMS
     10 from mmdet.structures import DetDataSample, ReIDDataSample, TrackDataSample
---> 11 from mmdet.structures.bbox import BaseBoxes
     14 @TRANSFORMS.register_module()
     15 class PackDetInputs(BaseTransform):
     16     """Pack the inputs data for the detection / semantic segmentation /
     17     panoptic segmentation.
     18 
   (...)
     42             'scale_factor', 'flip', 'flip_direction')``
     43     """

File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/bbox/__init__.py:2
      1 # Copyright (c) OpenMMLab. All rights reserved.
----> 2 from .base_boxes import BaseBoxes
      3 from .bbox_overlaps import bbox_overlaps
      4 from .box_type import (autocast_box_type, convert_box_type, get_box_type,
      5                        register_box, register_box_converter)

File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/bbox/base_boxes.py:9
      6 import torch
      7 from torch import BoolTensor, Tensor
----> 9 from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
     11 T = TypeVar('T')
     12 DeviceType = Union[str, torch.device]

File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/mask/__init__.py:3
      1 # Copyright (c) OpenMMLab. All rights reserved.
      2 from .mask_target import mask_target
----> 3 from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
      4                          bitmap_to_polygon, polygon_to_bitmap)
      5 from .utils import encode_mask_results, mask2bbox, split_combined_polys
      7 __all__ = [
      8     'split_combined_polys', 'mask_target', 'BaseInstanceMasks', 'BitmapMasks',
      9     'PolygonMasks', 'encode_mask_results', 'mask2bbox', 'polygon_to_bitmap',
     10     'bitmap_to_polygon'
     11 ]

File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/mask/structures.py:12
     10 import shapely.geometry as geometry
     11 import torch
---> 12 from mmcv.ops.roi_align import roi_align
     14 T = TypeVar('T')
     17 class BaseInstanceMasks(metaclass=ABCMeta):

File ~/opt/venv/lib/python3.12/site-packages/mmcv/ops/__init__.py:3
      1 # Copyright (c) OpenMMLab. All rights reserved.
      2 from mmcv.utils import IS_MLU_AVAILABLE
----> 3 from .active_rotated_filter import active_rotated_filter
      4 from .assign_score_withk import assign_score_withk
      5 from .ball_query import ball_query

File ~/opt/venv/lib/python3.12/site-packages/mmcv/ops/active_rotated_filter.py:10
      6 from torch.autograd.function import once_differentiable
      8 from ..utils import ext_loader
---> 10 ext_module = ext_loader.load_ext(
     11     '_ext',
     12     ['active_rotated_filter_forward', 'active_rotated_filter_backward'])
     15 class ActiveRotatedFilterFunction(Function):
     16     """Encoding the orientation information and generating orientation-
     17     sensitive features.
     18 
     19     The details are described in the paper `Align Deep Features for Oriented
     20     Object Detection  <https://arxiv.org/abs/2008.09397>_`.
     21     """

File ~/opt/venv/lib/python3.12/site-packages/mmcv/utils/ext_loader.py:13, in load_ext(name, funcs)
     12 def load_ext(name, funcs):
---> 13     ext = importlib.import_module('mmcv.' + name)
     14     for fun in funcs:
     15         assert hasattr(ext, fun), f'{fun} miss in module {name}'

File /usr/local/lib/python3.12/importlib/__init__.py:90, in import_module(name, package)
     88             break
     89         level += 1
---> 90 return _bootstrap._gcd_import(name[level:], package, level)

ModuleNotFoundError: No module named 'mmcv._ext'

To evaluate the model we just trained, run:

predictor.evaluate(test_path)

Note that it’s always recommended to use our predefined presets to save customization time with following code script:

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets="medium_quality",
)
predictor.fit(train_path, tuning_data=val_path)
predictor.evaluate(test_path)

For more about using presets, see Quick Start Coco.

And the evaluation results are shown in command line output. The first value is mAP in COCO standard, and the second one is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline.

We can get the prediction on test set:

pred = predictor.predict(test_path)

Let’s also visualize the prediction result:

!pip install opencv-python
from autogluon.multimodal.utils import visualize_detection
conf_threshold = 0.25  # Specify a confidence threshold to filter out unwanted boxes
visualization_result_dir = "./"  # Use the pwd as result dir to save the visualized image
visualized = visualize_detection(
    pred=pred[12:13],
    detection_classes=predictor.classes,
    conf_threshold=conf_threshold,
    visualization_result_dir=visualization_result_dir,
)
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')
display(img)

Under this fast finetune setting, we reached a good mAP number on a new dataset with a few hundred seconds! For how to finetune with higher performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset, where we finetuned a VFNet model with 5 hours and reached mAP = 0.450, mAP50 = 0.718 on this dataset.

Other Examples

You may go to AutoMM Examples to explore other examples about AutoMM.

Customization

To learn how to customize AutoMM, please refer to Customize AutoMM.

Citation

@article{DBLP:journals/corr/abs-2107-08430,
  author    = {Zheng Ge and
               Songtao Liu and
               Feng Wang and
               Zeming Li and
               Jian Sun},
  title     = {{YOLOX:} Exceeding {YOLO} Series in 2021},
  journal   = {CoRR},
  volume    = {abs/2107.08430},
  year      = {2021},
  url       = {https://arxiv.org/abs/2107.08430},
  eprinttype = {arXiv},
  eprint    = {2107.08430},
  timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}