AutoMM Detection - Quick Start on a Tiny COCO Format Dataset¶
In this section, our goal is to fast finetune a pretrained model on a small dataset in COCO format, and evaluate on its test set. Both training and test sets are in COCO format. See Convert Data to COCO Format for how to convert other datasets to COCO format.
Setting up the imports¶
To start, make sure mmcv and mmdet are installed.
Note: MMDet is no longer actively maintained and is only compatible with MMCV version 2.1.0. Installation can be problematic due to CUDA version compatibility issues. For best results:
Use CUDA 12.4 with PyTorch 2.5
Before installation, run:
pip install -U pip setuptools wheel sudo apt-get install -y ninja-build gcc g++
This will help prevent MMCV installation from hanging during wheel building.
After installation in Jupyter notebook, restart the kernel for changes to take effect.
# Update package tools and install build dependencies
!pip install -U pip setuptools wheel
!sudo apt-get install -y ninja-build gcc g++
# Install MMCV
!python3 -m mim install "mmcv==2.1.0"
# For Google Colab users: If the above fails, use this alternative MMCV installation
# pip install "mmcv==2.1.0" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html
# Install MMDet
!python3 -m pip install "mmdet==3.2.0"
# Install MMEngine (version >=0.10.6 for PyTorch 2.5 compatibility)
!python3 -m pip install "mmengine>=0.10.6"
To start, let’s import MultiModalPredictor:
from autogluon.multimodal import MultiModalPredictor
/home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
And also import some other packages that will be used in this tutorial:
import os
import time
from autogluon.core.utils.loaders import load_zip
Downloading Data¶
We have the sample dataset ready in the cloud. Let’s download it:
zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip"
download_dir = "./tiny_motorbike_coco"
load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "tiny_motorbike")
train_path = os.path.join(data_dir, "Annotations", "trainval_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "test_cocoformat.json")
Downloading ./tiny_motorbike_coco/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip...
0%| | 0.00/21.8M [00:00<?, ?iB/s]
53%|█████▎ | 11.5M/21.8M [00:00<00:00, 115MiB/s]
100%|██████████| 21.8M/21.8M [00:00<00:00, 111MiB/s]
Dataset Format¶
For COCO format datasets, provide JSON annotation files for each split:
trainval_cocoformat.json: train and validation datatest_cocoformat.json: test data
Model Selection¶
We use the medium_quality preset which features:
Base model: YOLOX-large (pretrained on COCO)
Benefits: Fast finetuning, quick inference, easy deployment
Alternative presets available:
high_quality: DINO-Resnet50 modelbest_quality: DINO-SwinL model
Both alternatives offer improved performance at the cost of slower processing and higher GPU memory requirements.
presets = "medium_quality"
When creating the MultiModalPredictor, specify these essential parameters:
problem_type="object_detection"to define the taskpresets="medium_quality"for presets selectionsample_data_pathpointing to any dataset split (typically train_path) to infer object categoriespath(optional) to set a custom save location
If no path is specified, the model will be automatically saved to a timestamped directory under AutogluonModels/.
# Init predictor
import uuid
model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"
predictor = MultiModalPredictor(
problem_type="object_detection",
sample_data_path=train_path,
presets=presets,
path=model_path,
)
Finetuning the Model¶
The model uses optimized preset configurations for learning rate, epochs, and batch size. By default, it employs a two-stage learning rate strategy:
Model head layers use 100x higher learning rate This approach accelerates convergence and typically improves performance, especially for small datasets (hundreds to thousands of images)
Timing results below are from a test run on AWS g4.2xlarge EC2 instance:
start = time.time()
predictor.fit(train_path) # Fit
train_end = time.time()
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Downloading yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth from https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth...
=================== System Info ===================
AutoGluon Version: 1.4.1b20251117
Python Version: 3.12.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.7.1+cu126
CUDA Version: 12.6
GPU Memory: GPU 0: 14.57/14.57 GB
Total GPU Memory: Free: 14.57 GB, Allocated: 0.00 GB, Total: 14.57 GB
GPU Count: 1
Memory Avail: 28.40 GB / 30.95 GB (91.8%)
Disk Space Avail: WARNING, an exception (FileNotFoundError) occurred while attempting to get available disk space. Consider opening a GitHub Issue.
===================================================
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `model.mmdet_image.coco_root=...` in hyperparameters if you think it is wrong.
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/object_detection/quick_start/tmp/a99faf3f39c14e9ba336b18a89b702f1-quick_start_tutorial_temp_save
```
Seed set to 0
0%| | 0.00/217M [00:00<?, ?iB/s]
0%| | 251k/217M [00:00<01:29, 2.44MiB/s]
0%| | 628k/217M [00:00<01:07, 3.20MiB/s]
1%| | 1.14M/217M [00:00<00:53, 4.01MiB/s]
1%| | 1.79M/217M [00:00<00:43, 5.00MiB/s]
1%| | 2.64M/217M [00:00<00:34, 6.24MiB/s]
2%|▏ | 3.75M/217M [00:00<00:27, 7.87MiB/s]
2%|▏ | 5.14M/217M [00:00<00:21, 9.84MiB/s]
3%|▎ | 6.92M/217M [00:00<00:17, 12.3MiB/s]
4%|▍ | 8.92M/217M [00:00<00:14, 14.6MiB/s]
5%|▌ | 11.0M/217M [00:01<00:12, 16.4MiB/s]
6%|▌ | 13.3M/217M [00:01<00:11, 18.0MiB/s]
7%|▋ | 15.3M/217M [00:01<00:10, 18.7MiB/s]
8%|▊ | 17.5M/217M [00:01<00:10, 19.5MiB/s]
9%|▉ | 19.6M/217M [00:01<00:10, 19.7MiB/s]
10%|█ | 21.8M/217M [00:01<00:09, 20.2MiB/s]
11%|█ | 24.0M/217M [00:01<00:09, 20.6MiB/s]
12%|█▏ | 26.1M/217M [00:01<00:09, 20.5MiB/s]
13%|█▎ | 28.3M/217M [00:01<00:09, 20.8MiB/s]
14%|█▍ | 30.5M/217M [00:01<00:08, 21.0MiB/s]
15%|█▌ | 32.7M/217M [00:02<00:08, 21.3MiB/s]
16%|█▌ | 34.8M/217M [00:02<00:08, 21.0MiB/s]
17%|█▋ | 36.9M/217M [00:02<00:08, 20.5MiB/s]
18%|█▊ | 39.0M/217M [00:02<00:08, 20.8MiB/s]
19%|█▉ | 41.1M/217M [00:02<00:08, 20.5MiB/s]
20%|█▉ | 43.3M/217M [00:02<00:08, 20.8MiB/s]
21%|██ | 45.4M/217M [00:02<00:08, 20.5MiB/s]
22%|██▏ | 47.6M/217M [00:02<00:08, 20.9MiB/s]
23%|██▎ | 49.8M/217M [00:02<00:07, 21.0MiB/s]
24%|██▍ | 51.9M/217M [00:02<00:07, 21.2MiB/s]
25%|██▍ | 54.1M/217M [00:03<00:07, 21.3MiB/s]
26%|██▌ | 56.2M/217M [00:03<00:07, 21.0MiB/s]
27%|██▋ | 58.3M/217M [00:03<00:07, 21.0MiB/s]
28%|██▊ | 60.5M/217M [00:03<00:07, 20.8MiB/s]
29%|██▉ | 62.6M/217M [00:03<00:07, 20.9MiB/s]
30%|██▉ | 64.8M/217M [00:03<00:07, 21.2MiB/s]
31%|███ | 67.0M/217M [00:03<00:07, 20.9MiB/s]
32%|███▏ | 69.1M/217M [00:03<00:07, 21.0MiB/s]
33%|███▎ | 71.3M/217M [00:03<00:06, 20.9MiB/s]
34%|███▍ | 73.5M/217M [00:03<00:06, 21.1MiB/s]
35%|███▍ | 75.7M/217M [00:04<00:06, 21.3MiB/s]
36%|███▌ | 77.8M/217M [00:04<00:06, 21.3MiB/s]
37%|███▋ | 79.9M/217M [00:04<00:06, 21.4MiB/s]
38%|███▊ | 82.1M/217M [00:04<00:06, 21.4MiB/s]
39%|███▉ | 84.2M/217M [00:04<00:06, 20.9MiB/s]
40%|███▉ | 86.4M/217M [00:04<00:06, 21.1MiB/s]
41%|████ | 88.5M/217M [00:04<00:06, 21.2MiB/s]
42%|████▏ | 90.7M/217M [00:04<00:05, 21.2MiB/s]
43%|████▎ | 92.8M/217M [00:04<00:05, 21.3MiB/s]
44%|████▎ | 95.0M/217M [00:04<00:05, 21.5MiB/s]
45%|████▍ | 97.2M/217M [00:05<00:05, 21.4MiB/s]
46%|████▌ | 99.4M/217M [00:05<00:05, 21.6MiB/s]
47%|████▋ | 102M/217M [00:05<00:05, 21.7MiB/s]
48%|████▊ | 104M/217M [00:05<00:05, 21.4MiB/s]
49%|████▉ | 106M/217M [00:05<00:05, 21.4MiB/s]
50%|████▉ | 108M/217M [00:05<00:05, 21.5MiB/s]
51%|█████ | 110M/217M [00:05<00:04, 21.4MiB/s]
52%|█████▏ | 113M/217M [00:05<00:04, 21.6MiB/s]
53%|█████▎ | 115M/217M [00:05<00:04, 21.7MiB/s]
54%|█████▍ | 117M/217M [00:06<00:04, 21.7MiB/s]
55%|█████▍ | 119M/217M [00:06<00:04, 21.5MiB/s]
56%|█████▌ | 121M/217M [00:06<00:04, 21.6MiB/s]
57%|█████▋ | 123M/217M [00:06<00:04, 21.2MiB/s]
58%|█████▊ | 126M/217M [00:06<00:04, 21.3MiB/s]
59%|█████▉ | 128M/217M [00:06<00:04, 21.5MiB/s]
60%|█████▉ | 130M/217M [00:06<00:04, 21.3MiB/s]
61%|██████ | 132M/217M [00:06<00:04, 21.2MiB/s]
62%|██████▏ | 134M/217M [00:06<00:03, 21.1MiB/s]
63%|██████▎ | 136M/217M [00:06<00:03, 20.7MiB/s]
64%|██████▍ | 139M/217M [00:07<00:03, 20.9MiB/s]
65%|██████▍ | 141M/217M [00:07<00:03, 21.1MiB/s]
66%|██████▌ | 143M/217M [00:07<00:03, 20.7MiB/s]
67%|██████▋ | 145M/217M [00:07<00:03, 21.0MiB/s]
68%|██████▊ | 147M/217M [00:07<00:03, 21.1MiB/s]
69%|██████▊ | 149M/217M [00:07<00:03, 20.7MiB/s]
70%|██████▉ | 151M/217M [00:07<00:03, 20.9MiB/s]
71%|███████ | 154M/217M [00:07<00:03, 21.1MiB/s]
72%|███████▏ | 156M/217M [00:07<00:02, 21.3MiB/s]
73%|███████▎ | 158M/217M [00:07<00:02, 21.0MiB/s]
74%|███████▎ | 160M/217M [00:08<00:02, 21.1MiB/s]
75%|███████▍ | 162M/217M [00:08<00:02, 21.4MiB/s]
76%|███████▌ | 164M/217M [00:08<00:02, 21.4MiB/s]
77%|███████▋ | 167M/217M [00:08<00:02, 21.5MiB/s]
78%|███████▊ | 169M/217M [00:08<00:02, 21.5MiB/s]
79%|███████▊ | 171M/217M [00:08<00:02, 21.6MiB/s]
80%|███████▉ | 173M/217M [00:08<00:02, 21.6MiB/s]
81%|████████ | 175M/217M [00:08<00:01, 21.7MiB/s]
82%|████████▏ | 177M/217M [00:08<00:01, 21.7MiB/s]
83%|████████▎ | 180M/217M [00:08<00:01, 21.7MiB/s]
84%|████████▎ | 182M/217M [00:09<00:01, 21.7MiB/s]
85%|████████▍ | 184M/217M [00:09<00:01, 21.7MiB/s]
86%|████████▌ | 186M/217M [00:09<00:01, 21.7MiB/s]
87%|████████▋ | 188M/217M [00:09<00:01, 21.6MiB/s]
88%|████████▊ | 190M/217M [00:09<00:01, 21.7MiB/s]
89%|████████▊ | 193M/217M [00:09<00:01, 21.6MiB/s]
90%|████████▉ | 195M/217M [00:09<00:01, 21.6MiB/s]
91%|█████████ | 197M/217M [00:09<00:00, 21.2MiB/s]
92%|█████████▏| 199M/217M [00:09<00:00, 21.3MiB/s]
93%|█████████▎| 201M/217M [00:09<00:00, 20.7MiB/s]
94%|█████████▎| 203M/217M [00:10<00:00, 20.4MiB/s]
95%|█████████▍| 206M/217M [00:10<00:00, 20.7MiB/s]
96%|█████████▌| 208M/217M [00:10<00:00, 20.9MiB/s]
97%|█████████▋| 210M/217M [00:10<00:00, 20.9MiB/s]
98%|█████████▊| 212M/217M [00:10<00:00, 21.1MiB/s]
99%|█████████▊| 214M/217M [00:10<00:00, 21.4MiB/s]
100%|█████████▉| 216M/217M [00:10<00:00, 21.6MiB/s]
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[8], line 2
1 start = time.time()
----> 2 predictor.fit(train_path) # Fit
3 train_end = time.time()
File ~/autogluon/multimodal/src/autogluon/multimodal/predictor.py:540, in MultiModalPredictor.fit(self, train_data, presets, tuning_data, max_num_tuning_data, id_mappings, time_limit, save_path, hyperparameters, column_types, holdout_frac, teacher_predictor, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, predictions, labels, predictors)
537 assert isinstance(predictors, list)
538 learners = [ele if isinstance(ele, str) else ele._learner for ele in predictors]
--> 540 self._learner.fit(
541 train_data=train_data,
542 presets=presets,
543 tuning_data=tuning_data,
544 max_num_tuning_data=max_num_tuning_data,
545 time_limit=time_limit,
546 save_path=save_path,
547 hyperparameters=hyperparameters,
548 column_types=column_types,
549 holdout_frac=holdout_frac,
550 teacher_learner=teacher_learner,
551 seed=seed,
552 standalone=standalone,
553 hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
554 clean_ckpts=clean_ckpts,
555 id_mappings=id_mappings,
556 predictions=predictions,
557 labels=labels,
558 learners=learners,
559 )
561 return self
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:243, in ObjectDetectionLearner.fit(self, train_data, presets, tuning_data, max_num_tuning_data, time_limit, save_path, hyperparameters, column_types, holdout_frac, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, **kwargs)
236 self.fit_sanity_check()
237 self.prepare_fit_args(
238 time_limit=time_limit,
239 seed=seed,
240 standalone=standalone,
241 clean_ckpts=clean_ckpts,
242 )
--> 243 fit_returns = self.execute_fit()
244 self.on_fit_end(
245 training_start=training_start,
246 strategy=fit_returns.get("strategy", None),
(...)
249 clean_ckpts=clean_ckpts,
250 )
252 return self
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:577, in BaseLearner.execute_fit(self)
575 return dict()
576 else:
--> 577 attributes = self.fit_per_run(**self._fit_args)
578 self.update_attributes(**attributes) # only update attributes for non-HPO mode
579 return attributes
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:380, in ObjectDetectionLearner.fit_per_run(self, max_time, save_path, ckpt_path, resume, enable_progress_bar, seed, hyperparameters, advanced_hyperparameters, config, df_preprocessor, data_processors, model, standalone, clean_ckpts)
375 df_preprocessor = self.get_df_preprocessor_per_run(
376 df_preprocessor=df_preprocessor,
377 config=config,
378 )
379 config = self.update_config_by_data_per_run(config=config, df_preprocessor=df_preprocessor)
--> 380 model = self.get_model_per_run(model=model, config=config)
381 model = self.compile_model_per_run(config=config, model=model)
382 data_processors = self.get_data_processors_per_run(
383 data_processors=data_processors,
384 config=config,
385 model=model,
386 advanced_hyperparameters=advanced_hyperparameters,
387 )
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/object_detection.py:349, in ObjectDetectionLearner.get_model_per_run(self, model, config)
347 def get_model_per_run(self, model, config):
348 if model is None:
--> 349 model = create_fusion_model(
350 config=config,
351 num_classes=self._output_shape,
352 classes=self._classes,
353 )
354 return model
File ~/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1624, in create_fusion_model(config, num_classes, classes, num_numerical_columns, num_categories, numerical_fill_values, pretrained)
1622 for model_name in names:
1623 model_config = getattr(config.model, model_name)
-> 1624 model = create_model(
1625 model_name=model_name,
1626 model_config=model_config,
1627 num_classes=num_classes,
1628 classes=classes,
1629 num_numerical_columns=num_numerical_columns,
1630 num_categories=num_categories,
1631 numerical_fill_values=numerical_fill_values,
1632 pretrained=pretrained,
1633 )
1635 if isinstance(model, functools.partial): # fusion model
1636 if fusion_model is None:
File ~/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1421, in create_model(model_name, model_config, num_classes, classes, num_numerical_columns, num_categories, numerical_fill_values, pretrained, is_matching)
1418 elif model_name.lower().startswith(MMDET_IMAGE):
1419 from .mmdet_image import MMDetAutoModelForObjectDetection
-> 1421 model = MMDetAutoModelForObjectDetection(
1422 prefix=model_name,
1423 checkpoint_name=model_config.checkpoint_name,
1424 config_file=model_config.config_file,
1425 classes=classes,
1426 pretrained=pretrained,
1427 output_bbox_format=model_config.output_bbox_format,
1428 frozen_layers=model_config.frozen_layers,
1429 )
1430 elif model_name.lower().startswith(MMOCR_TEXT_DET):
1431 from .mmocr_text_detection import MMOCRAutoModelForTextDetection
File ~/autogluon/multimodal/src/autogluon/multimodal/models/mmdet_image.py:91, in MMDetAutoModelForObjectDetection.__init__(self, prefix, checkpoint_name, config_file, classes, pretrained, output_bbox_format, frozen_layers)
88 self._load_config()
90 self._update_classes(classes)
---> 91 self._load_checkpoint(self.checkpoint_file)
93 freeze_model_layers(self.model, self.frozen_layers)
File ~/autogluon/multimodal/src/autogluon/multimodal/models/mmdet_image.py:121, in MMDetAutoModelForObjectDetection._load_checkpoint(self, checkpoint_file)
117 def _load_checkpoint(self, checkpoint_file):
118 # build model and load pretrained weights
119 from mmdet.utils import register_all_modules
--> 121 register_all_modules() # https://github.com/open-mmlab/mmdetection/issues/9719
123 self.model = MODELS.build(self.config.model)
124 # yolox use self.config.model.data_preprocessor, yolov3 use self.config.data_preprocessor
File ~/opt/venv/lib/python3.12/site-packages/mmdet/utils/setup_env.py:97, in register_all_modules(init_default_scope)
86 def register_all_modules(init_default_scope: bool = True) -> None:
87 """Register all modules in mmdet into the registries.
88
89 Args:
(...)
95 Defaults to True.
96 """ # noqa
---> 97 import mmdet.datasets # noqa: F401,F403
98 import mmdet.engine # noqa: F401,F403
99 import mmdet.evaluation # noqa: F401,F403
File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/__init__.py:26
22 from .reid_dataset import ReIDDataset
23 from .samplers import (AspectRatioBatchSampler, ClassAwareSampler,
24 GroupMultiSourceSampler, MultiSourceSampler,
25 TrackAspectRatioBatchSampler, TrackImgSampler)
---> 26 from .utils import get_loading_pipeline
27 from .v3det import V3DetDataset
28 from .voc import VOCDataset
File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/utils.py:5
1 # Copyright (c) OpenMMLab. All rights reserved.
3 from mmcv.transforms import LoadImageFromFile
----> 5 from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
6 from mmdet.registry import TRANSFORMS
9 def get_loading_pipeline(pipeline):
File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/transforms/__init__.py:6
2 from .augment_wrappers import AutoAugment, RandAugment
3 from .colorspace import (AutoContrast, Brightness, Color, ColorTransform,
4 Contrast, Equalize, Invert, Posterize, Sharpness,
5 Solarize, SolarizeAdd)
----> 6 from .formatting import (ImageToTensor, PackDetInputs, PackReIDInputs,
7 PackTrackInputs, ToTensor, Transpose)
8 from .frame_sampling import BaseFrameSample, UniformRefFrameSample
9 from .geometric import (GeomTransform, Rotate, ShearX, ShearY, TranslateX,
10 TranslateY)
File ~/opt/venv/lib/python3.12/site-packages/mmdet/datasets/transforms/formatting.py:11
9 from mmdet.registry import TRANSFORMS
10 from mmdet.structures import DetDataSample, ReIDDataSample, TrackDataSample
---> 11 from mmdet.structures.bbox import BaseBoxes
14 @TRANSFORMS.register_module()
15 class PackDetInputs(BaseTransform):
16 """Pack the inputs data for the detection / semantic segmentation /
17 panoptic segmentation.
18
(...)
42 'scale_factor', 'flip', 'flip_direction')``
43 """
File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/bbox/__init__.py:2
1 # Copyright (c) OpenMMLab. All rights reserved.
----> 2 from .base_boxes import BaseBoxes
3 from .bbox_overlaps import bbox_overlaps
4 from .box_type import (autocast_box_type, convert_box_type, get_box_type,
5 register_box, register_box_converter)
File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/bbox/base_boxes.py:9
6 import torch
7 from torch import BoolTensor, Tensor
----> 9 from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
11 T = TypeVar('T')
12 DeviceType = Union[str, torch.device]
File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/mask/__init__.py:3
1 # Copyright (c) OpenMMLab. All rights reserved.
2 from .mask_target import mask_target
----> 3 from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
4 bitmap_to_polygon, polygon_to_bitmap)
5 from .utils import encode_mask_results, mask2bbox, split_combined_polys
7 __all__ = [
8 'split_combined_polys', 'mask_target', 'BaseInstanceMasks', 'BitmapMasks',
9 'PolygonMasks', 'encode_mask_results', 'mask2bbox', 'polygon_to_bitmap',
10 'bitmap_to_polygon'
11 ]
File ~/opt/venv/lib/python3.12/site-packages/mmdet/structures/mask/structures.py:12
10 import shapely.geometry as geometry
11 import torch
---> 12 from mmcv.ops.roi_align import roi_align
14 T = TypeVar('T')
17 class BaseInstanceMasks(metaclass=ABCMeta):
File ~/opt/venv/lib/python3.12/site-packages/mmcv/ops/__init__.py:3
1 # Copyright (c) OpenMMLab. All rights reserved.
2 from mmcv.utils import IS_MLU_AVAILABLE
----> 3 from .active_rotated_filter import active_rotated_filter
4 from .assign_score_withk import assign_score_withk
5 from .ball_query import ball_query
File ~/opt/venv/lib/python3.12/site-packages/mmcv/ops/active_rotated_filter.py:10
6 from torch.autograd.function import once_differentiable
8 from ..utils import ext_loader
---> 10 ext_module = ext_loader.load_ext(
11 '_ext',
12 ['active_rotated_filter_forward', 'active_rotated_filter_backward'])
15 class ActiveRotatedFilterFunction(Function):
16 """Encoding the orientation information and generating orientation-
17 sensitive features.
18
19 The details are described in the paper `Align Deep Features for Oriented
20 Object Detection <https://arxiv.org/abs/2008.09397>_`.
21 """
File ~/opt/venv/lib/python3.12/site-packages/mmcv/utils/ext_loader.py:13, in load_ext(name, funcs)
12 def load_ext(name, funcs):
---> 13 ext = importlib.import_module('mmcv.' + name)
14 for fun in funcs:
15 assert hasattr(ext, fun), f'{fun} miss in module {name}'
File /usr/local/lib/python3.12/importlib/__init__.py:90, in import_module(name, package)
88 break
89 level += 1
---> 90 return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'
Notice that at the end of each progress bar, if the checkpoint at current stage is saved,
it prints the model’s save path.
In this example, it’s ./quick_start_tutorial_temp_save.
Print out the time and we can see that it’s fast!
print("This finetuning takes %.2f seconds." % (train_end - start))
Evaluation¶
To evaluate the model we just trained, run following code.
And the evaluation results are shown in command line output. The first line is mAP in COCO standard, and the second line is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline. Note that for presenting a fast finetuning we use presets “medium_quality”, you could get better result on this dataset by simply using “high_quality” or “best_quality” presets, or customize your own model and hyperparameter settings: Customization, and some other examples at Fast Fine-tune Coco or High Performance Fine-tune Coco.
predictor.evaluate(test_path)
eval_end = time.time()
Print out the evaluation time:
print("The evaluation takes %.2f seconds." % (eval_end - train_end))
We can load a new predictor with previous save path, and we can also reset the number of used GPUs if not all the devices are available:
# Load and reset num_gpus
new_predictor = MultiModalPredictor.load(model_path)
new_predictor.set_num_gpus(1)
Evaluating the new predictor gives us exactly the same result:
# Evaluate new predictor
new_predictor.evaluate(test_path)
For how to set the hyperparameters and finetune the model with higher performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset.
Inference¶
Let’s perform predictions using our finetuned model. The predictor can process the entire test set with a single command:
pred = predictor.predict(test_path)
print(len(pred)) # Number of predictions
print(pred[:3]) # Sample of first 3 predictions
The predictor returns predictions as a pandas DataFrame with two columns:
image: Contains path to each input imagebboxes: Contains list of detected objects, where each object is a dictionary:{ "class": "predicted_class_name", "bbox": [x1, y1, x2, y2], # Coordinates of Upper Left and Bottom Right corners "score": confidence_score }
By default, predictions are returned but not saved. To save detection results, use the save parameter in your predict call.
# To save as csv format
pred = predictor.predict(test_path, save_results=True, as_coco=False)
# Or to save as COCO format. Note that the `pred` returned is always a pandas dataframe.
pred = predictor.predict(test_path, save_results=True, as_coco=True, result_save_path="./results.json")
The predictions can be saved in two formats:
CSV file: Matches the DataFrame structure with image and bboxes columns
COCO JSON: Standard COCO format annotation file
This works with any predictor configuration (pretrained or finetuned models).
Visualizing Results¶
To run visualizations, ensure that you have opencv installed. If you haven’t already, install opencv by running
!pip install opencv-python
To visualize the detection bounding boxes, run the following:
from autogluon.multimodal.utils import ObjectDetectionVisualizer
conf_threshold = 0.4 # Specify a confidence threshold to filter out unwanted boxes
image_result = pred.iloc[30]
img_path = image_result.image # Select an image to visualize
visualizer = ObjectDetectionVisualizer(img_path) # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold) # Draw detections
visualized = out.get_image() # Get the visualized image
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)
Testing on Your Own Data¶
You can also predict on your own images with various input format. The follow is an example:
Download the example image:
from autogluon.multimodal.utils import download
image_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg"
test_image = download(image_url)
Run inference on data in a json file of COCO format (See Convert Data to COCO Format for more details about COCO format). Note that since the root is by default the parent folder of the annotation file, here we put the annotation file in a folder:
import json
# create a input file for demo
data = {"images": [{"id": 0, "width": -1, "height": -1, "file_name": test_image}], "categories": []}
os.mkdir("input_data_for_demo")
input_file = "input_data_for_demo/demo_annotation.json"
with open(input_file, "w+") as f:
json.dump(data, f)
pred_test_image = predictor.predict(input_file)
print(pred_test_image)
Run inference on data in a list of image file names:
pred_test_image = predictor.predict([test_image])
print(pred_test_image)
Other Examples¶
You may go to AutoMM Examples to explore other examples about AutoMM.
Customization¶
To learn how to customize AutoMM, please refer to Customize AutoMM.
Citation¶
@article{DBLP:journals/corr/abs-2107-08430,
author = {Zheng Ge and
Songtao Liu and
Feng Wang and
Zeming Li and
Jian Sun},
title = {{YOLOX:} Exceeding {YOLO} Series in 2021},
journal = {CoRR},
volume = {abs/2107.08430},
year = {2021},
url = {https://arxiv.org/abs/2107.08430},
eprinttype = {arXiv},
eprint = {2107.08430},
timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},
bibsource = {dblp computer science bibliography, https://dblp.org},
}