.. _sec_automm_detection_fast_ft_coco: AutoMM Detection - Fast Finetune on COCO Format Dataset ======================================================= In this section, our goal is to fast finetune a pretrained model on VOC2017 training set, and evaluate it in VOC2007 test set. Both training and test sets are in COCO format. See :ref:`sec_automm_detection_prepare_voc` for how to prepare VOC dataset, and :ref:`sec_automm_detection_convert_to_coco` for how to convert other datasets to COCO format. To start, let’s import MultiModalPredictor: .. code:: python from autogluon.multimodal import MultiModalPredictor We select the YOLOv3 with MobileNetV2 as backbone, and input resolution is 320x320, pretrained on COCO dataset. With this setting, it is fast to finetune or inference, and easy to deploy. While using COCO format dataset, the input is the json annotation file of the dataset split. In this example, ``voc07_train.json`` and ``voc07_test.json`` are the annotation files of train and test split of VOC2007 dataset. And we use all the GPUs (if any): .. code:: python checkpoint_name = "yolov3_mobilenetv2_320_300e_coco" num_gpus = -1 # use all GPUs train_path = "./VOCdevkit/VOC2007/Annotations/train_cocoformat.json" test_path = "./VOCdevkit/VOC2007/Annotations/test_cocoformat.json" We create the MultiModalPredictor with selected checkpoint name and number of GPUs. We need to specify the problem_type to ``"object_detection"``, and also provide a ``sample_data_path`` for the predictor to infer the catgories of the dataset. Here we provide the ``train_path``, and it also works using any other split of this dataset. .. code:: python predictor = MultiModalPredictor( hyperparameters={ "model.mmdet_image.checkpoint_name": checkpoint_name, "env.num_gpus": num_gpus, }, problem_type="object_detection", sample_data_path=train_path, ) If no data sample is available at this point, you can also create the MultiModalPredictor by manually input the classes: .. code:: python voc_classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] predictor = MultiModalPredictor( hyperparameters={ "model.mmdet_image.checkpoint_name": checkpoint_name, "env.num_gpus": num_gpus, }, problem_type="object_detection", classes=voc_classes, ) We set the learning rate to be ``1e-4``. Note that we use a two-stage learning rate option during finetuning by default, and the model head will have 100x learning rate. Using a two-stage learning rate with high learning rate only on head layers makes the model converge faster during finetuning. It usually gives better performance as well, especially on small datasets with hundreds or thousands of images. We also set the epoch to be 5 for fast finetuning and batch_size to be 32. We also compute the time of the fit process here for better understanding the speed. .. code:: python import time start = time.time() predictor.fit( train_path, hyperparameters={ "optimization.learning_rate": 1e-4, # we use two stage and detection head has 100x lr "optimization.max_epochs": 5, "env.per_gpu_batch_size": 32, # decrease it when model is large }, ) end = time.time() We run it on a g5dn.12xlarge EC2 machine on AWS, and part of the command outputs are shown below: :: Epoch 0: 98%|██████████████████████████████████████████████████████████████████████████████████████████▏ | 50/51 [00:15<00:00, 3.19it/s, loss=766, v_num=Epoch 0, global step 40: 'val_direct_loss' reached 555.37537 (best 555.37537), saving model to '/media/code/autogluon/examples/automm/object_detection/AutogluonModels/ag-20221104_185342/epoch=0-step=40.ckpt' as top 1 Epoch 1: 49%|█████████████████████████████████████████████ | 25/51 [00:08<00:08, 3.01it/s, loss=588, v_num=Epoch 1, global step 61: 'val_direct_loss' reached 499.56232 (best 499.56232), saving model to '/media/code/autogluon/examples/automm/object_detection/AutogluonModels/ag-20221104_185342/epoch=1-step=61.ckpt' as top 1 Epoch 1: 98%|██████████████████████████████████████████████████████████████████████████████████████████▏ | 50/51 [00:15<00:00, 3.17it/s, loss=554, v_num=Epoch 1, global step 81: 'val_direct_loss' reached 481.33121 (best 481.33121), saving model to '/media/code/autogluon/examples/automm/object_detection/AutogluonModels/ag-20221104_185342/epoch=1-step=81.ckpt' as top 1 Epoch 2: 49%|█████████████████████████████████████████████ | 25/51 [00:08<00:08, 2.99it/s, loss=539, v_num=Epoch 2, global step 102: 'val_direct_loss' reached 460.25449 (best 460.25449), saving model to '/media/code/autogluon/examples/automm/object_detection/AutogluonModels/ag-20221104_185342/epoch=2-step=102.ckpt' as top 1 Epoch 2: 98%|██████████████████████████████████████████████████████████████████████████████████████████▏ | 50/51 [00:15<00:00, 3.15it/s, loss=539, v_num=Epoch 2, global step 122: 'val_direct_loss' was not in top 1 Epoch 3: 49%|█████████████████████████████████████████████ | 25/51 [00:08<00:08, 2.96it/s, loss=533, v_num=Epoch 3, global step 143: 'val_direct_loss' was not in top 1 Epoch 3: 88%|█████████████████████████████████████████████████████████████████████████████████▏ | 45/51 [00:14<00:01, 3.17it/s, loss=508, v_num=] Notice that at the end of each progress bar, if the checkpoint at current stage is saved, it prints the model’s save path. In this example, it’s ``/media/code/autogluon/examples/automm/object_detection/AutogluonModels/ag-20221104_185342``. You can also specify the ``save_path`` like below while creating the MultiModalPredictor. :: predictor = MultiModalPredictor( save_path="./this_is_a_save_path", ... ) Print out the time and we can see that it only takes 100.42 seconds! .. code:: python print("This finetuning takes %.2f seconds." % (end - start)) :: This finetuning takes 100.42 seconds. To evaluate the model we just trained, run: .. code:: python predictor.evaluate(test_path) And the evaluation results are shown in command line output. The first value ``0.375`` is mAP in COCO standard, and the second one ``0.755`` is mAP in VOC standard (or mAP50). For more details about these metrics, see `COCO’s evaluation guideline `__. :: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.375 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.755 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.311 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.111 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.230 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.431 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.355 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.505 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.515 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.258 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.415 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.556 Under this fast finetune setting, we reached ``mAP50 = 0.755`` on VOC with 100 seconds! For how to finetune with higher performance, see :ref:`sec_automm_detection_high_ft_coco`, where we finetuned a VFNet model with 5 hours and reached ``mAP50 = 0.932`` on VOC. Other Examples ~~~~~~~~~~~~~~ You may go to `AutoMM Examples `__ to explore other examples about AutoMM. Customization ~~~~~~~~~~~~~ To learn how to customize AutoMM, please refer to :ref:`sec_automm_customization`. Citation ~~~~~~~~ :: @misc{redmon2018yolov3, title={YOLOv3: An Incremental Improvement}, author={Joseph Redmon and Ali Farhadi}, year={2018}, eprint={1804.02767}, archivePrefix={arXiv}, primaryClass={cs.CV} }