AutoMM Detection - Quick Start on a Tiny COCO Format Dataset¶
In this section, our goal is to fast finetune a pretrained model on a small dataset in COCO format, and evaluate on its test set. Both training and test sets are in COCO format. See Convert Data to COCO Format for how to convert other datasets to COCO format.
Setting up the imports¶
To start, make sure mmcv
and mmdet
are installed.
Note: MMDet is no longer actively maintained and is only compatible with MMCV version 2.1.0. Installation can be problematic due to CUDA version compatibility issues. For best results:
Use CUDA 12.4 with PyTorch 2.5
Before installation, run:
pip install -U pip setuptools wheel sudo apt-get install -y ninja-build gcc g++
This will help prevent MMCV installation from hanging during wheel building.
After installation in Jupyter notebook, restart the kernel for changes to take effect.
# Update package tools and install build dependencies
!pip install -U pip setuptools wheel
!sudo apt-get install -y ninja-build gcc g++
# Install MMCV
!python3 -m mim install "mmcv==2.1.0"
# For Google Colab users: If the above fails, use this alternative MMCV installation
# pip install "mmcv==2.1.0" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html
# Install MMDet
!python3 -m pip install "mmdet==3.2.0"
# Install MMEngine (version >=0.10.6 for PyTorch 2.5 compatibility)
!python3 -m pip install "mmengine>=0.10.6"
To start, let’s import MultiModalPredictor:
from autogluon.multimodal import MultiModalPredictor
/home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
And also import some other packages that will be used in this tutorial:
import os
import time
from autogluon.core.utils.loaders import load_zip
Downloading Data¶
We have the sample dataset ready in the cloud. Let’s download it:
zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip"
download_dir = "./tiny_motorbike_coco"
load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "tiny_motorbike")
train_path = os.path.join(data_dir, "Annotations", "trainval_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "test_cocoformat.json")
Downloading ./tiny_motorbike_coco/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip...
0%| | 0.00/21.8M [00:00<?, ?iB/s]
47%|████▋ | 10.3M/21.8M [00:00<00:00, 103MiB/s]
95%|█████████▍| 20.7M/21.8M [00:00<00:00, 104MiB/s]
100%|██████████| 21.8M/21.8M [00:00<00:00, 104MiB/s]
Dataset Format¶
For COCO format datasets, provide JSON annotation files for each split:
trainval_cocoformat.json
: train and validation datatest_cocoformat.json
: test data
Model Selection¶
We use the medium_quality
preset which features:
Base model: YOLOX-large (pretrained on COCO)
Benefits: Fast finetuning, quick inference, easy deployment
Alternative presets available:
high_quality
: DINO-Resnet50 modelbest_quality
: DINO-SwinL model
Both alternatives offer improved performance at the cost of slower processing and higher GPU memory requirements.
presets = "medium_quality"
When creating the MultiModalPredictor, specify these essential parameters:
problem_type="object_detection"
to define the taskpresets="medium_quality"
for presets selectionsample_data_path
pointing to any dataset split (typically train_path) to infer object categoriespath
(optional) to set a custom save location
If no path is specified, the model will be automatically saved to a timestamped directory under AutogluonModels/.
# Init predictor
import uuid
model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"
predictor = MultiModalPredictor(
problem_type="object_detection",
sample_data_path=train_path,
presets=presets,
path=model_path,
)
Finetuning the Model¶
The model uses optimized preset configurations for learning rate, epochs, and batch size. By default, it employs a two-stage learning rate strategy:
Model head layers use 100x higher learning rate This approach accelerates convergence and typically improves performance, especially for small datasets (hundreds to thousands of images)
Timing results below are from a test run on AWS g4.2xlarge EC2 instance:
start = time.time()
predictor.fit(train_path) # Fit
train_end = time.time()
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Downloading yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth from https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth...
Loads checkpoint by local backend from path: yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth
The model and loaded state dict do not match exactly
size mismatch for bbox_head.multi_level_conv_cls.0.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv_cls.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for bbox_head.multi_level_conv_cls.1.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv_cls.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([10]).
size mismatch for bbox_head.multi_level_conv_cls.2.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([10, 256, 1, 1]).
size mismatch for bbox_head.multi_level_conv_cls.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([10]).
=================== System Info ===================
AutoGluon Version: 1.4.1b20250908
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.7.1+cu126
CUDA Version: 12.6
GPU Count: 1
Memory Avail: 28.39 GB / 30.95 GB (91.7%)
Disk Space Avail: WARNING, an exception (FileNotFoundError) occurred while attempting to get available disk space. Consider opening a GitHub Issue.
===================================================
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save
```
Seed set to 0
0%| | 0.00/217M [00:00<?, ?iB/s]
0%| | 517k/217M [00:00<00:43, 4.95MiB/s]
1%| | 2.64M/217M [00:00<00:14, 14.3MiB/s]
2%|▏ | 5.07M/217M [00:00<00:11, 18.7MiB/s]
3%|▎ | 7.40M/217M [00:00<00:10, 19.6MiB/s]
5%|▍ | 9.87M/217M [00:00<00:10, 20.2MiB/s]
6%|▌ | 12.1M/217M [00:00<00:10, 20.2MiB/s]
7%|▋ | 14.7M/217M [00:00<00:09, 20.9MiB/s]
8%|▊ | 17.2M/217M [00:00<00:09, 21.8MiB/s]
9%|▉ | 19.6M/217M [00:00<00:09, 21.9MiB/s]
10%|█ | 22.1M/217M [00:01<00:09, 21.5MiB/s]
11%|█▏ | 24.7M/217M [00:01<00:08, 21.8MiB/s]
13%|█▎ | 27.2M/217M [00:01<00:08, 22.7MiB/s]
14%|█▎ | 29.5M/217M [00:01<00:08, 22.1MiB/s]
15%|█▍ | 31.8M/217M [00:01<00:08, 22.2MiB/s]
16%|█▌ | 34.0M/217M [00:01<00:08, 21.4MiB/s]
17%|█▋ | 36.2M/217M [00:01<00:08, 21.2MiB/s]
18%|█▊ | 38.4M/217M [00:01<00:08, 21.2MiB/s]
19%|█▉ | 40.9M/217M [00:01<00:08, 22.0MiB/s]
20%|█▉ | 43.4M/217M [00:02<00:07, 22.8MiB/s]
21%|██ | 45.7M/217M [00:02<00:07, 22.3MiB/s]
22%|██▏ | 47.9M/217M [00:02<00:07, 22.3MiB/s]
23%|██▎ | 50.1M/217M [00:02<00:07, 22.0MiB/s]
24%|██▍ | 52.3M/217M [00:02<00:07, 21.8MiB/s]
25%|██▌ | 54.7M/217M [00:02<00:07, 21.8MiB/s]
26%|██▌ | 56.9M/217M [00:02<00:07, 21.9MiB/s]
27%|██▋ | 59.1M/217M [00:02<00:07, 20.8MiB/s]
28%|██▊ | 61.5M/217M [00:02<00:07, 21.4MiB/s]
29%|██▉ | 63.8M/217M [00:03<00:07, 21.7MiB/s]
30%|███ | 66.2M/217M [00:03<00:06, 21.6MiB/s]
32%|███▏ | 68.6M/217M [00:03<00:06, 21.9MiB/s]
33%|███▎ | 70.9M/217M [00:03<00:06, 22.2MiB/s]
34%|███▎ | 73.3M/217M [00:03<00:06, 22.1MiB/s]
35%|███▍ | 75.5M/217M [00:03<00:06, 21.4MiB/s]
36%|███▌ | 77.7M/217M [00:03<00:07, 19.8MiB/s]
37%|███▋ | 79.7M/217M [00:03<00:06, 19.8MiB/s]
38%|███▊ | 81.7M/217M [00:03<00:07, 19.2MiB/s]
38%|███▊ | 83.6M/217M [00:03<00:07, 19.0MiB/s]
39%|███▉ | 85.5M/217M [00:04<00:07, 18.2MiB/s]
40%|████ | 87.4M/217M [00:04<00:07, 18.4MiB/s]
41%|████ | 89.4M/217M [00:04<00:06, 18.7MiB/s]
42%|████▏ | 91.3M/217M [00:04<00:06, 18.5MiB/s]
43%|████▎ | 93.4M/217M [00:04<00:06, 19.0MiB/s]
44%|████▍ | 95.3M/217M [00:04<00:06, 18.4MiB/s]
45%|████▍ | 97.1M/217M [00:04<00:06, 18.4MiB/s]
46%|████▌ | 99.0M/217M [00:04<00:06, 18.0MiB/s]
46%|████▋ | 101M/217M [00:04<00:06, 18.7MiB/s]
47%|████▋ | 103M/217M [00:05<00:06, 18.5MiB/s]
48%|████▊ | 105M/217M [00:05<00:05, 18.9MiB/s]
49%|████▉ | 107M/217M [00:05<00:05, 18.5MiB/s]
50%|█████ | 109M/217M [00:05<00:05, 18.8MiB/s]
51%|█████ | 111M/217M [00:05<00:05, 18.6MiB/s]
52%|█████▏ | 113M/217M [00:05<00:05, 18.6MiB/s]
53%|█████▎ | 114M/217M [00:05<00:05, 18.6MiB/s]
53%|█████▎ | 116M/217M [00:05<00:05, 18.5MiB/s]
54%|█████▍ | 118M/217M [00:05<00:05, 18.8MiB/s]
55%|█████▌ | 120M/217M [00:05<00:05, 18.9MiB/s]
56%|█████▌ | 122M/217M [00:06<00:05, 18.6MiB/s]
57%|█████▋ | 124M/217M [00:06<00:04, 19.0MiB/s]
58%|█████▊ | 126M/217M [00:06<00:04, 18.8MiB/s]
59%|█████▉ | 128M/217M [00:06<00:04, 19.2MiB/s]
60%|█████▉ | 130M/217M [00:06<00:04, 19.1MiB/s]
61%|██████ | 132M/217M [00:06<00:04, 19.4MiB/s]
62%|██████▏ | 134M/217M [00:06<00:04, 18.8MiB/s]
63%|██████▎ | 136M/217M [00:06<00:04, 18.9MiB/s]
63%|██████▎ | 138M/217M [00:06<00:04, 19.2MiB/s]
64%|██████▍ | 140M/217M [00:06<00:04, 18.6MiB/s]
65%|██████▌ | 142M/217M [00:07<00:04, 18.5MiB/s]
66%|██████▌ | 144M/217M [00:07<00:03, 18.8MiB/s]
67%|██████▋ | 145M/217M [00:07<00:03, 18.8MiB/s]
68%|██████▊ | 147M/217M [00:07<00:03, 18.8MiB/s]
69%|██████▊ | 149M/217M [00:07<00:03, 18.0MiB/s]
70%|██████▉ | 151M/217M [00:07<00:03, 18.5MiB/s]
70%|███████ | 153M/217M [00:07<00:03, 18.5MiB/s]
71%|███████▏ | 155M/217M [00:07<00:03, 19.2MiB/s]
72%|███████▏ | 157M/217M [00:07<00:03, 18.4MiB/s]
73%|███████▎ | 159M/217M [00:08<00:03, 18.4MiB/s]
74%|███████▍ | 161M/217M [00:08<00:03, 18.6MiB/s]
75%|███████▍ | 163M/217M [00:08<00:03, 18.1MiB/s]
76%|███████▌ | 165M/217M [00:08<00:02, 18.8MiB/s]
77%|███████▋ | 167M/217M [00:08<00:02, 19.0MiB/s]
78%|███████▊ | 169M/217M [00:08<00:02, 19.1MiB/s]
78%|███████▊ | 171M/217M [00:08<00:02, 16.0MiB/s]
79%|███████▉ | 172M/217M [00:08<00:02, 16.3MiB/s]
80%|████████ | 174M/217M [00:08<00:02, 16.9MiB/s]
81%|████████ | 176M/217M [00:09<00:02, 17.1MiB/s]
82%|████████▏ | 178M/217M [00:09<00:02, 17.5MiB/s]
83%|████████▎ | 180M/217M [00:09<00:02, 18.4MiB/s]
84%|████████▎ | 182M/217M [00:09<00:01, 18.2MiB/s]
85%|████████▍ | 184M/217M [00:09<00:01, 17.7MiB/s]
85%|████████▌ | 186M/217M [00:09<00:01, 18.3MiB/s]
86%|████████▋ | 188M/217M [00:09<00:01, 18.1MiB/s]
87%|████████▋ | 189M/217M [00:09<00:01, 18.0MiB/s]
88%|████████▊ | 191M/217M [00:09<00:01, 17.9MiB/s]
89%|████████▉ | 193M/217M [00:09<00:01, 17.3MiB/s]
90%|████████▉ | 195M/217M [00:10<00:01, 17.5MiB/s]
91%|█████████ | 197M/217M [00:10<00:01, 17.5MiB/s]
91%|█████████▏| 199M/217M [00:10<00:01, 18.3MiB/s]
92%|█████████▏| 201M/217M [00:10<00:00, 18.1MiB/s]
93%|█████████▎| 203M/217M [00:10<00:00, 18.6MiB/s]
94%|█████████▍| 205M/217M [00:10<00:00, 18.2MiB/s]
95%|█████████▌| 207M/217M [00:10<00:00, 18.6MiB/s]
96%|█████████▌| 209M/217M [00:10<00:00, 19.2MiB/s]
97%|█████████▋| 211M/217M [00:10<00:00, 18.6MiB/s]
98%|█████████▊| 213M/217M [00:11<00:00, 18.8MiB/s]
99%|█████████▉| 215M/217M [00:11<00:00, 19.0MiB/s]
100%|█████████▉| 216M/217M [00:11<00:00, 18.0MiB/s]
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/ci/opt/venv/lib/python3.12/site-packages/lightning/pytorch/utilities/model_summary/model_summary.py:231: Precision 16-mixed is not supported by the model summary. Estimated model size in MB will not be accurate. Using 32 bits instead.
| Name | Type | Params | Mode
-------------------------------------------------------------------------------
0 | model | MMDetAutoModelForObjectDetection | 54.2 M | train
1 | validation_metric | MeanAveragePrecision | 0 | train
-------------------------------------------------------------------------------
54.2 M Trainable params
0 Non-trainable params
54.2 M Total params
216.620 Total estimated model params size (MB)
592 Modules in train mode
0 Modules in eval mode
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
/home/ci/opt/venv/lib/python3.12/site-packages/torch/functional.py:554: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4314.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/task_modules/assigners/sim_ota_assigner.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
Epoch 2, global step 15: 'val_map' reached 0.33005 (best 0.33005), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/epoch=2-step=15.ckpt' as top 1
Epoch 5, global step 30: 'val_map' reached 0.39863 (best 0.39863), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/epoch=5-step=30.ckpt' as top 1
Epoch 8, global step 45: 'val_map' reached 0.40470 (best 0.40470), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/epoch=8-step=45.ckpt' as top 1
Epoch 11, global step 60: 'val_map' reached 0.45690 (best 0.45690), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/epoch=11-step=60.ckpt' as top 1
Epoch 14, global step 75: 'val_map' was not in top 1
Epoch 17, global step 90: 'val_map' was not in top 1
Epoch 20, global step 105: 'val_map' reached 0.46151 (best 0.46151), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/epoch=20-step=105.ckpt' as top 1
Epoch 23, global step 120: 'val_map' was not in top 1
Epoch 26, global step 135: 'val_map' was not in top 1
Epoch 29, global step 150: 'val_map' was not in top 1
Epoch 32, global step 165: 'val_map' was not in top 1
Epoch 35, global step 180: 'val_map' was not in top 1
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
```python
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save")
```
If you are not satisfied with the model, try to increase the training time,
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
Notice that at the end of each progress bar, if the checkpoint at current stage is saved,
it prints the model’s save path.
In this example, it’s ./quick_start_tutorial_temp_save
.
Print out the time and we can see that it’s fast!
print("This finetuning takes %.2f seconds." % (train_end - start))
This finetuning takes 400.16 seconds.
Evaluation¶
To evaluate the model we just trained, run following code.
And the evaluation results are shown in command line output. The first line is mAP in COCO standard, and the second line is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline. Note that for presenting a fast finetuning we use presets “medium_quality”, you could get better result on this dataset by simply using “high_quality” or “best_quality” presets, or customize your own model and hyperparameter settings: Customization, and some other examples at Fast Fine-tune Coco or High Performance Fine-tune Coco.
predictor.evaluate(test_path)
eval_end = time.time()
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
saving file at /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203342/object_detection_result_cache.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.10s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.396
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.716
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.375
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.476
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.737
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.449
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.469
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.536
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.800
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203342"
Print out the evaluation time:
print("The evaluation takes %.2f seconds." % (eval_end - train_end))
The evaluation takes 1.79 seconds.
We can load a new predictor with previous save path, and we can also reset the number of used GPUs if not all the devices are available:
# Load and reset num_gpus
new_predictor = MultiModalPredictor.load(model_path)
new_predictor.set_num_gpus(1)
Load pretrained checkpoint: /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/68c46cee5ba348a784613600f515f340-quick_start_tutorial_temp_save/model.ckpt
Evaluating the new predictor gives us exactly the same result:
# Evaluate new predictor
new_predictor.evaluate(test_path)
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
saving file at /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203347/object_detection_result_cache.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.09s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.396
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.716
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.375
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.476
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.737
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.449
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.469
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.536
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.800
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203347"
{'map': np.float64(0.3959001573796106),
'mean_average_precision': np.float64(0.3959001573796106),
'map_50': np.float64(0.7162269509048831),
'map_75': np.float64(0.3751963330599429),
'map_small': np.float64(0.17176141296139583),
'map_medium': np.float64(0.47572017347684875),
'map_large': np.float64(0.7371152730970115),
'mar_1': np.float64(0.27606342494714586),
'mar_10': np.float64(0.4493624618275781),
'mar_100': np.float64(0.46856612638007983),
'mar_small': np.float64(0.4),
'mar_medium': np.float64(0.535952380952381),
'mar_large': np.float64(0.800098965362123)}
For how to set the hyperparameters and finetune the model with higher performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset.
Inference¶
Let’s perform predictions using our finetuned model. The predictor can process the entire test set with a single command:
pred = predictor.predict(test_path)
print(len(pred)) # Number of predictions
print(pred[:3]) # Sample of first 3 predictions
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
50
image \
0 ./tiny_motorbike_coco/tiny_motorbike/Annotatio...
1 ./tiny_motorbike_coco/tiny_motorbike/Annotatio...
2 ./tiny_motorbike_coco/tiny_motorbike/Annotatio...
bboxes
0 [{'class': 'bicycle', 'class_id': 0, 'bbox': [...
1 [{'class': 'person', 'class_id': 8, 'bbox': [2...
2 [{'class': 'person', 'class_id': 8, 'bbox': [1...
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203348"
Saved detection results to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203348/result.txt
The predictor returns predictions as a pandas DataFrame with two columns:
image
: Contains path to each input imagebboxes
: Contains list of detected objects, where each object is a dictionary:{ "class": "predicted_class_name", "bbox": [x1, y1, x2, y2], # Coordinates of Upper Left and Bottom Right corners "score": confidence_score }
By default, predictions are returned but not saved. To save detection results, use the save parameter in your predict call.
# To save as csv format
pred = predictor.predict(test_path, save_results=True, as_coco=False)
# Or to save as COCO format. Note that the `pred` returned is always a pandas dataframe.
pred = predictor.predict(test_path, save_results=True, as_coco=True, result_save_path="./results.json")
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
saving file at ./results.json
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203350"
Saved detection results to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203350/result.txt
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203350-001"
Saved detection results as dataframe to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203350-001/result.txt
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203351"
Saved detection results to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203351/result.txt
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203352"
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
Saved detection result to ./results.json
Saved detection results as coco to ./results.json
The predictions can be saved in two formats:
CSV file: Matches the DataFrame structure with image and bboxes columns
COCO JSON: Standard COCO format annotation file
This works with any predictor configuration (pretrained or finetuned models).
Visualizing Results¶
To run visualizations, ensure that you have opencv
installed. If you haven’t already, install opencv
by running
!pip install opencv-python
Requirement already satisfied: opencv-python in /home/ci/opt/venv/lib/python3.12/site-packages (4.12.0.88)
Requirement already satisfied: numpy<2.3.0,>=2 in /home/ci/opt/venv/lib/python3.12/site-packages (from opencv-python) (2.1.3)
To visualize the detection bounding boxes, run the following:
from autogluon.multimodal.utils import ObjectDetectionVisualizer
conf_threshold = 0.4 # Specify a confidence threshold to filter out unwanted boxes
image_result = pred.iloc[30]
img_path = image_result.image # Select an image to visualize
visualizer = ObjectDetectionVisualizer(img_path) # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold) # Draw detections
visualized = out.get_image() # Get the visualized image
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)
/tmp/ipykernel_2497/1236348162.py:14: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
img = Image.fromarray(visualized, 'RGB')

Testing on Your Own Data¶
You can also predict on your own images with various input format. The follow is an example:
Download the example image:
from autogluon.multimodal.utils import download
image_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg"
test_image = download(image_url)
Downloading street_small.jpg from https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg...
0%| | 0.00/119k [00:00<?, ?iB/s]
Run inference on data in a json file of COCO format (See Convert Data to COCO Format for more details about COCO format). Note that since the root is by default the parent folder of the annotation file, here we put the annotation file in a folder:
import json
# create a input file for demo
data = {"images": [{"id": 0, "width": -1, "height": -1, "file_name": test_image}], "categories": []}
os.mkdir("input_data_for_demo")
input_file = "input_data_for_demo/demo_annotation.json"
with open(input_file, "w+") as f:
json.dump(data, f)
pred_test_image = predictor.predict(input_file)
print(pred_test_image)
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
image \
0 input_data_for_demo/../street_small.jpg
bboxes
0 [{'class': 'person', 'class_id': 8, 'bbox': [2...
Using default root folder: input_data_for_demo/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
Saved detection results to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203352/result.txt
Run inference on data in a list of image file names:
pred_test_image = predictor.predict([test_image])
print(pred_test_image)
image bboxes
0 street_small.jpg [{'class': 'person', 'class_id': 8, 'bbox': [2...
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
/home/ci/opt/venv/lib/python3.12/site-packages/mmdet/models/backbones/csp_darknet.py:118: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20250908_203354"
Saved detection results to /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/AutogluonModels/ag-20250908_203354/result.txt
Other Examples¶
You may go to AutoMM Examples to explore other examples about AutoMM.
Customization¶
To learn how to customize AutoMM, please refer to Customize AutoMM.
Citation¶
@article{DBLP:journals/corr/abs-2107-08430,
author = {Zheng Ge and
Songtao Liu and
Feng Wang and
Zeming Li and
Jian Sun},
title = {{YOLOX:} Exceeding {YOLO} Series in 2021},
journal = {CoRR},
volume = {abs/2107.08430},
year = {2021},
url = {https://arxiv.org/abs/2107.08430},
eprinttype = {arXiv},
eprint = {2107.08430},
timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},
bibsource = {dblp computer science bibliography, https://dblp.org},
}