AutoMM Detection - Fast Finetune on COCO Format Dataset¶

Fig. 1 Pothole Dataset¶
In this section, our goal is to fast finetune and evaluate a pretrained
model on Pothole
dataset
in COCO format. Pothole is a single object, i.e. pothole
,
detection dataset, containing 665 images with bounding box annotations
for the creation of detection models and can work as POC/POV for the
maintenance of roads. See AutoMM Detection - Prepare Pascal VOC Dataset for
how to prepare Pothole dataset.
To start, let’s import MultiModalPredictor:
from autogluon.multimodal import MultiModalPredictor
Make sure mmcv-full
and mmdet
are installed:
!mim install mmcv-full
!pip install mmdet
Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.12.0/index.html
Requirement already satisfied: mmcv-full in /home/ci/opt/venv/lib/python3.8/site-packages (1.7.1)
Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (4.7.0.68)
Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (0.32.0)
Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (5.4.1)
Requirement already satisfied: packaging in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (23.0)
Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (2.4.0)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (1.22.4)
Requirement already satisfied: Pillow in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (9.4.0)
Requirement already satisfied: mmdet in /home/ci/opt/venv/lib/python3.8/site-packages (2.27.0)
Requirement already satisfied: terminaltables in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.1.10)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.22.4)
Requirement already satisfied: scipy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.8.1)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.6.2)
Requirement already satisfied: pycocotools in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (2.0.6)
Requirement already satisfied: six in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.16.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.0.6)
Requirement already satisfied: pyparsing>=2.2.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (2.8.2)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (23.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (4.38.0)
Requirement already satisfied: pillow>=6.2.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (9.4.0)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.4.4)
And also import some other packages that will be used in this tutorial:
import os
import time
from autogluon.core.utils.loaders import load_zip
We have the sample dataset ready in the cloud. Let’s download it:
zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip"
download_dir = "./pothole"
load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "pothole")
train_path = os.path.join(data_dir, "Annotations", "usersplit_train_cocoformat.json")
val_path = os.path.join(data_dir, "Annotations", "usersplit_val_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "usersplit_test_cocoformat.json")
Downloading ./pothole/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip...
100%|██████████| 351M/351M [00:07<00:00, 49.5MiB/s]
While using COCO format dataset, the input is the json annotation file
of the dataset split. In this example,
usersplit_train_cocoformat.json
is the annotation file of the train
split. usersplit_val_cocoformat.json
is the annotation file of the
validation split. And usersplit_test_cocoformat.json
is the
annotation file of the test split.
We select the YOLOv3 with MobileNetV2 as backbone, and input resolution is 320x320, pretrained on COCO dataset. With this setting, it is fast to finetune or inference, and easy to deploy. And we use all the GPUs (if any):
checkpoint_name = "yolov3_mobilenetv2_320_300e_coco"
num_gpus = -1 # use all GPUs
We create the MultiModalPredictor with selected checkpoint name and
number of GPUs. We need to specify the problem_type to
"object_detection"
, and also provide a sample_data_path
for the
predictor to infer the catgories of the dataset. Here we provide the
train_path
, and it also works using any other split of this dataset.
predictor = MultiModalPredictor(
hyperparameters={
"model.mmdet_image.checkpoint_name": checkpoint_name,
"env.num_gpus": num_gpus,
},
problem_type="object_detection",
sample_data_path=train_path,
)
processing yolov3_mobilenetv2_320_300e_coco...
Output()
[32mSuccessfully downloaded yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
[32mSuccessfully dumped yolov3_mobilenetv2_320_300e_coco.py to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
load checkpoint from local path: yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth
The model and loaded state dict do not match exactly
size mismatch for bbox_head.convs_pred.0.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for bbox_head.convs_pred.1.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for bbox_head.convs_pred.2.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
We set the learning rate to be 2e-4
. Note that we use a two-stage
learning rate option during finetuning by default, and the model head
will have 100x learning rate. Using a two-stage learning rate with high
learning rate only on head layers makes the model converge faster during
finetuning. It usually gives better performance as well, especially on
small datasets with hundreds or thousands of images. We also set the
epoch to be 30 for fast finetuning and batch_size to be 32. We also
compute the time of the fit process here for better understanding the
speed.
import time
start = time.time()
predictor.fit(
train_path,
hyperparameters={
"optimization.learning_rate": 2e-4, # we use two stage and detection head has 100x lr
"optimization.max_epochs": 30,
"env.per_gpu_batch_size": 32, # decrease it when model is large
},
)
end = time.time()
Global seed set to 123
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ----------------------------------------------------------------------- 0 | model | MMDetAutoModelForObjectDetection | 3.7 M 1 | validation_metric | MeanMetric | 0 ----------------------------------------------------------------------- 3.7 M Trainable params 0 Non-trainable params 3.7 M Total params 14.675 Total estimated model params size (MB) Epoch 0, global step 1: 'val_direct_loss' reached 50010.81641 (best 50010.81641), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=0-step=1.ckpt' as top 1 Epoch 0, global step 3: 'val_direct_loss' reached 6710.23389 (best 6710.23389), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=0-step=3.ckpt' as top 1 Epoch 1, global step 4: 'val_direct_loss' reached 2486.97095 (best 2486.97095), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=1-step=4.ckpt' as top 1 Epoch 1, global step 6: 'val_direct_loss' reached 934.13849 (best 934.13849), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=1-step=6.ckpt' as top 1 Epoch 2, global step 7: 'val_direct_loss' reached 727.33563 (best 727.33563), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=2-step=7.ckpt' as top 1 Epoch 2, global step 9: 'val_direct_loss' was not in top 1 Epoch 3, global step 10: 'val_direct_loss' was not in top 1 Epoch 3, global step 12: 'val_direct_loss' reached 684.20221 (best 684.20221), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=3-step=12.ckpt' as top 1 Epoch 4, global step 13: 'val_direct_loss' was not in top 1 Epoch 4, global step 15: 'val_direct_loss' reached 684.00562 (best 684.00562), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=4-step=15.ckpt' as top 1 Epoch 5, global step 16: 'val_direct_loss' reached 603.59332 (best 603.59332), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=5-step=16.ckpt' as top 1 Epoch 5, global step 18: 'val_direct_loss' was not in top 1 Epoch 6, global step 19: 'val_direct_loss' reached 596.55841 (best 596.55841), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=6-step=19.ckpt' as top 1 Epoch 6, global step 21: 'val_direct_loss' reached 592.49280 (best 592.49280), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=6-step=21.ckpt' as top 1 Epoch 7, global step 22: 'val_direct_loss' reached 570.63574 (best 570.63574), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=7-step=22.ckpt' as top 1 Epoch 7, global step 24: 'val_direct_loss' reached 517.43585 (best 517.43585), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=7-step=24.ckpt' as top 1 Epoch 8, global step 25: 'val_direct_loss' was not in top 1 Epoch 8, global step 27: 'val_direct_loss' was not in top 1 Epoch 9, global step 28: 'val_direct_loss' was not in top 1 Epoch 9, global step 30: 'val_direct_loss' reached 499.55167 (best 499.55167), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=9-step=30.ckpt' as top 1 Epoch 10, global step 31: 'val_direct_loss' was not in top 1 Epoch 10, global step 33: 'val_direct_loss' reached 478.12878 (best 478.12878), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=10-step=33.ckpt' as top 1 Epoch 11, global step 34: 'val_direct_loss' reached 467.34561 (best 467.34561), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=11-step=34.ckpt' as top 1 Epoch 11, global step 36: 'val_direct_loss' was not in top 1 Epoch 12, global step 37: 'val_direct_loss' was not in top 1 Epoch 12, global step 39: 'val_direct_loss' was not in top 1 Epoch 13, global step 40: 'val_direct_loss' was not in top 1 Epoch 13, global step 42: 'val_direct_loss' reached 455.64987 (best 455.64987), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=13-step=42.ckpt' as top 1 Epoch 14, global step 43: 'val_direct_loss' reached 424.58078 (best 424.58078), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=14-step=43.ckpt' as top 1 Epoch 14, global step 45: 'val_direct_loss' was not in top 1 Epoch 15, global step 46: 'val_direct_loss' reached 376.81406 (best 376.81406), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021426/epoch=15-step=46.ckpt' as top 1 Epoch 15, global step 48: 'val_direct_loss' was not in top 1 Epoch 16, global step 49: 'val_direct_loss' was not in top 1 Epoch 16, global step 51: 'val_direct_loss' was not in top 1 Epoch 17, global step 52: 'val_direct_loss' was not in top 1 Epoch 17, global step 54: 'val_direct_loss' was not in top 1 Epoch 18, global step 55: 'val_direct_loss' was not in top 1 Epoch 18, global step 57: 'val_direct_loss' was not in top 1 Epoch 19, global step 58: 'val_direct_loss' was not in top 1 Epoch 19, global step 60: 'val_direct_loss' was not in top 1 Epoch 20, global step 61: 'val_direct_loss' was not in top 1 Epoch 20, global step 63: 'val_direct_loss' was not in top 1 Epoch 21, global step 64: 'val_direct_loss' was not in top 1 Epoch 21, global step 66: 'val_direct_loss' was not in top 1 Epoch 22, global step 67: 'val_direct_loss' was not in top 1 Epoch 22, global step 69: 'val_direct_loss' was not in top 1 Epoch 23, global step 70: 'val_direct_loss' was not in top 1 Epoch 23, global step 72: 'val_direct_loss' was not in top 1 Epoch 24, global step 73: 'val_direct_loss' was not in top 1 Epoch 24, global step 75: 'val_direct_loss' was not in top 1 Epoch 25, global step 76: 'val_direct_loss' was not in top 1 Epoch 25, global step 78: 'val_direct_loss' was not in top 1 Epoch 26, global step 79: 'val_direct_loss' was not in top 1 Epoch 26, global step 81: 'val_direct_loss' was not in top 1 Epoch 27, global step 82: 'val_direct_loss' was not in top 1 Epoch 27, global step 84: 'val_direct_loss' was not in top 1 Epoch 28, global step 85: 'val_direct_loss' was not in top 1 Epoch 28, global step 87: 'val_direct_loss' was not in top 1 Epoch 29, global step 88: 'val_direct_loss' was not in top 1 Epoch 29, global step 90: 'val_direct_loss' was not in top 1 Trainer.fit stopped: max_epochs=30 reached.
Print out the time and we can see that it’s fast!
print("This finetuning takes %.2f seconds." % (end - start))
This finetuning takes 263.63 seconds.
To evaluate the model we just trained, run:
predictor.evaluate(test_path)
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
WARNING:automm:A new predictor save path is created.This is to prevent you to overwrite previous predictor saved here.You could check current save path at predictor._save_path.If you still want to use this path, set resume=True
saving file at /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230111_021852/object_detection_result_cache.json loading annotations into memory... Done (t=0.00s) creating index... index created! Loading and preparing results... DONE (t=0.01s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=0.32s). Accumulating evaluation results... DONE (t=0.04s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.183 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.496 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.105 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.049 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.187 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.319 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.148 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.294 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.341 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.335 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.462
{'map': 0.18301730969881794}
And the evaluation results are shown in command line output. The first value is mAP in COCO standard, and the second one is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline.
We can get the prediction on test set:
pred = predictor.predict(test_path)
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Let’s also visualize the prediction result:
!pip install opencv-python
Requirement already satisfied: opencv-python in /home/ci/opt/venv/lib/python3.8/site-packages (4.7.0.68)
Requirement already satisfied: numpy>=1.17.3 in /home/ci/opt/venv/lib/python3.8/site-packages (from opencv-python) (1.22.4)
from autogluon.multimodal.utils import visualize_detection
conf_threshold = 0.25 # Specify a confidence threshold to filter out unwanted boxes
visualization_result_dir = "./" # Use the pwd as result dir to save the visualized image
visualized = visualize_detection(
pred=pred[12:13],
detection_classes=predictor.get_predictor_classes(),
conf_threshold=conf_threshold,
visualization_result_dir=visualization_result_dir,
)
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')
display(img)

Under this fast finetune setting, we reached a good mAP number on a new
dataset with a few hundred seconds! For how to finetune with higher
performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset, where we
finetuned a VFNet model with 5 hours and reached
mAP = 0.450, mAP50 = 0.718
on this dataset.
Other Examples¶
You may go to AutoMM Examples to explore other examples about AutoMM.
Customization¶
To learn how to customize AutoMM, please refer to Customize AutoMM.
Citation¶
@misc{redmon2018yolov3,
title={YOLOv3: An Incremental Improvement},
author={Joseph Redmon and Ali Farhadi},
year={2018},
eprint={1804.02767},
archivePrefix={arXiv},
primaryClass={cs.CV}
}