.. _sec_object_detection_quick:

Object Detection - Quick Start
==============================


Object detection is the process of identifying and localizing objects in
an image and is an important task in computer vision. Follow this
tutorial to learn how to use AutoGluon for object detection.

**Tip**: If you are new to AutoGluon, review :ref:`sec_imgquick` first
to learn the basics of the AutoGluon API.

Our goal is to detect motorbike in images by `YOLOv3
model <https://pjreddie.com/media/files/papers/YOLOv3.pdf>`__. A tiny
dataset is collected from VOC dataset, which only contains the motorbike
category. The model pretrained on the COCO dataset is used to fine-tune
our small dataset. With the help of AutoGluon, we are able to try many
models with different hyperparameters automatically, and return the best
one as our final model.

To start, import ObjectDetector:

.. code:: python

    from autogluon.vision import ObjectDetector


.. parsed-literal::
    :class: output

    /home/ci/opt/venv/lib/python3.8/site-packages/gluoncv/__init__.py:40: UserWarning: Both `mxnet==1.9.1` and `torch==1.12.1+cu102` are installed. You might encounter increased GPU memory footprint if both framework are used at the same time.
      warnings.warn(f'Both `mxnet=={mx.__version__}` and `torch=={torch.__version__}` are installed. '
    INFO:matplotlib.font_manager:generated new fontManager
    INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7gmo9777
    INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7gmo9777/_remote_module_non_scriptable.py
    INFO:root:Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
    INFO:root:Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt


Tiny_motorbike Dataset
----------------------

We collect a toy dataset for detecting motorbikes in images. From the
VOC dataset, images are randomly selected for training, validation, and
testing - 120 images for training, 50 images for validation, and 50 for
testing. This tiny dataset follows the same format as VOC.

Using the commands below, we can download this dataset, which is only
23M. The name of unzipped folder is called ``tiny_motorbike``. Anyway,
the task dataset helper can perform the download and extraction
automatically, and load the dataset according to the detection formats.

.. code:: python

    url = 'https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip'
    dataset_train = ObjectDetector.Dataset.from_voc(url, splits='trainval')


.. parsed-literal::
    :class: output

    Downloading /home/ci/.gluoncv/archive/tiny_motorbike.zip from https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip...


.. parsed-literal::
    :class: output

    21273KB [00:01, 19119.50KB/s]                           


.. parsed-literal::
    :class: output

    tiny_motorbike/
    ├── Annotations/
    ├── ImageSets/
    └── JPEGImages/


Fit Models by AutoGluon
-----------------------

In this section, we demonstrate how to apply AutoGluon to fit our
detection models. We use mobilenet as the backbone for the YOLOv3 model.
Two different learning rates are used to fine-tune the network. The best
model is the one that obtains the best performance on the validation
dataset. You can also try using more networks and hyperparameters to
create a larger searching space.

We ``fit`` a classifier using AutoGluon as follows. In each experiment
(one trial in our searching space), we train the model for 5 epochs to
avoid bursting our tutorial runtime.

.. code:: python

    time_limit = 60*30  # at most 0.5 hour
    detector = ObjectDetector()
    hyperparameters = {'epochs': 5, 'batch_size': 8}
    hyperparameter_tune_kwargs={'num_trials': 2}
    detector.fit(dataset_train, time_limit=time_limit, hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)


.. parsed-literal::
    :class: output

    =============================================================================
    WARNING: ObjectDetector is deprecated as of v0.4.0 and may contain various bugs and issues!
    In a future release ObjectDetector may be entirely reworked to use Torch as a backend.
    This future change will likely be API breaking.Users should ensure they update their code that depends on ObjectDetector when upgrading to future AutoGluon releases.
    For more information, refer to ObjectDetector refactor GitHub issue: https://github.com/awslabs/autogluon/issues/1559
    =============================================================================
    
    The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1
    Randomly split train_data into train[150]/validation[20] splits.
    Starting HPO experiments


.. parsed-literal::
    :class: output

      0%|          | 0/2 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    INFO:SSDEstimator:modified configs(<old> != <new>): {
    INFO:SSDEstimator:root.train.epochs    20 != 5
    INFO:SSDEstimator:root.train.early_stop_baseline 0.0 != -inf
    INFO:SSDEstimator:root.train.seed      233 != 304
    INFO:SSDEstimator:root.train.early_stop_max_value 1.0 != inf
    INFO:SSDEstimator:root.train.batch_size 16 != 8
    INFO:SSDEstimator:root.train.early_stop_patience -1 != 10
    INFO:SSDEstimator:root.num_workers     4 != 8
    INFO:SSDEstimator:root.gpus            (0, 1, 2, 3) != (0,)
    INFO:SSDEstimator:root.valid.batch_size 16 != 8
    INFO:SSDEstimator:root.ssd.base_network vgg16_atrous != resnet50_v1
    INFO:SSDEstimator:root.ssd.data_shape  300 != 512
    INFO:SSDEstimator:root.dataset         voc_tiny != auto
    INFO:SSDEstimator:root.dataset_root    ~/.mxnet/datasets/ != auto
    INFO:SSDEstimator:}
    INFO:SSDEstimator:Saved config to /home/ci/autogluon/docs/_build/eval/tutorials/object_detection/4bc97620/.trial_0/config.yaml
    INFO:SSDEstimator:Using transfer learning from ssd_512_resnet50_v1_coco, the other network parameters are ignored.
    INFO:root:Model file not found. Downloading.


.. parsed-literal::
    :class: output

    Downloading /home/ci/.mxnet/models/ssd_512_resnet50_v1_coco-c4835162.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_coco-c4835162.zip...


.. parsed-literal::
    :class: output

    
      0%|          | 0/181188 [00:00<?, ?KB/s][A
      0%|          | 101/181188 [00:00<03:45, 804.02KB/s][A
      0%|          | 520/181188 [00:00<01:18, 2302.73KB/s][A
      1%|          | 2162/181188 [00:00<00:24, 7195.29KB/s][A
      4%|▍         | 7937/181188 [00:00<00:07, 24443.31KB/s][A
      8%|▊         | 14345/181188 [00:00<00:04, 37430.43KB/s][A
     13%|█▎        | 23337/181188 [00:00<00:02, 54165.92KB/s][A
     17%|█▋        | 30610/181188 [00:00<00:02, 59968.05KB/s][A
     21%|██        | 37483/181188 [00:00<00:02, 61382.81KB/s][A
     25%|██▌       | 46022/181188 [00:00<00:01, 68626.25KB/s][A
     29%|██▉       | 53020/181188 [00:01<00:01, 68692.47KB/s][A
     34%|███▍      | 61940/181188 [00:01<00:01, 74764.90KB/s][A
     38%|███▊      | 69493/181188 [00:01<00:01, 73668.59KB/s][A
     43%|████▎     | 77433/181188 [00:01<00:01, 74786.68KB/s][A
     47%|████▋     | 85530/181188 [00:01<00:01, 76612.96KB/s][A
     51%|█████▏    | 93298/181188 [00:01<00:01, 76924.35KB/s][A
     56%|█████▌    | 101605/181188 [00:01<00:01, 78753.11KB/s][A
     60%|██████    | 109498/181188 [00:01<00:00, 77986.89KB/s][A
     65%|██████▍   | 117759/181188 [00:01<00:00, 78305.67KB/s][A
     69%|██████▉   | 125724/181188 [00:02<00:00, 78700.45KB/s][A
     74%|███████▎  | 133601/181188 [00:02<00:00, 78357.70KB/s][A
     78%|███████▊  | 141690/181188 [00:02<00:00, 79107.26KB/s][A
     83%|████████▎ | 149606/181188 [00:02<00:00, 78308.23KB/s][A
     87%|████████▋ | 157970/181188 [00:02<00:00, 79484.91KB/s][A
     92%|█████████▏| 165922/181188 [00:02<00:00, 78613.17KB/s][A
    181189KB [00:02, 66955.99KB/s]                            
    INFO:SSDEstimator:Start training from [Epoch 0]
    INFO:SSDEstimator:[Epoch 0] Training cost: 11.064638, CrossEntropy=3.616203, SmoothL1=1.069762
    INFO:SSDEstimator:[Epoch 0] Validation: 
    chair=nan
    boat=nan
    motorbike=0.7653565977429614
    pottedplant=nan
    car=0.6464646464646466
    bus=nan
    bicycle=0.03636363636363636
    dog=nan
    cow=nan
    person=0.7877959688618874
    mAP=0.5589952123582829
    INFO:SSDEstimator:[Epoch 0] Current best map: 0.558995 vs previous 0.000000, saved to /home/ci/autogluon/docs/_build/eval/tutorials/object_detection/4bc97620/.trial_0/best_checkpoint.pkl
    INFO:SSDEstimator:[Epoch 1] Training cost: 7.569960, CrossEntropy=2.491966, SmoothL1=1.054675
    INFO:SSDEstimator:[Epoch 1] Validation: 
    chair=nan
    boat=nan
    motorbike=0.8190910444107375
    pottedplant=nan
    car=0.6590909090909091
    bus=nan
    bicycle=0.0
    dog=nan
    cow=nan
    person=0.6106134623755007
    mAP=0.5221988539692869
    INFO:SSDEstimator:[Epoch 2] Training cost: 7.487073, CrossEntropy=2.431485, SmoothL1=1.071399
    INFO:SSDEstimator:[Epoch 2] Validation: 
    chair=nan
    boat=nan
    motorbike=0.8317643925374864
    pottedplant=nan
    car=1.0000000000000002
    bus=nan
    bicycle=0.0
    dog=nan
    cow=nan
    person=0.784931734931735
    mAP=0.6541740318673054
    INFO:SSDEstimator:[Epoch 2] Current best map: 0.654174 vs previous 0.558995, saved to /home/ci/autogluon/docs/_build/eval/tutorials/object_detection/4bc97620/.trial_0/best_checkpoint.pkl
    INFO:SSDEstimator:[Epoch 3] Training cost: 7.550369, CrossEntropy=2.279991, SmoothL1=0.978284
    INFO:SSDEstimator:[Epoch 3] Validation: 
    chair=nan
    boat=nan
    motorbike=0.872616020343293
    pottedplant=nan
    car=1.0000000000000002
    bus=nan
    bicycle=0.0
    dog=nan
    cow=nan
    person=0.861377792623018
    mAP=0.6834984532415778
    INFO:SSDEstimator:[Epoch 3] Current best map: 0.683498 vs previous 0.654174, saved to /home/ci/autogluon/docs/_build/eval/tutorials/object_detection/4bc97620/.trial_0/best_checkpoint.pkl
    INFO:SSDEstimator:[Epoch 4] Training cost: 7.039415, CrossEntropy=2.212785, SmoothL1=0.984403
    INFO:SSDEstimator:[Epoch 4] Validation: 
    chair=nan
    boat=nan
    motorbike=0.9053679653679654
    pottedplant=nan
    car=0.8484848484848483
    bus=nan
    bicycle=0.0
    dog=nan
    cow=nan
    person=0.7991626429126427
    mAP=0.6382538641913641
    INFO:SSDEstimator:Applying the state from the best checkpoint...
    INFO:root:Model file not found. Downloading.


.. parsed-literal::
    :class: output

    Downloading /home/ci/.mxnet/models/resnet50_v1-cc729d95.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v1-cc729d95.zip...


.. parsed-literal::
    :class: output

    
      0%|          | 0/57421 [00:00<?, ?KB/s][A
      0%|          | 91/57421 [00:00<01:16, 745.51KB/s][A
      1%|          | 518/57421 [00:00<00:24, 2338.38KB/s][A
      4%|▍         | 2161/57421 [00:00<00:07, 7483.77KB/s][A
     11%|█▏        | 6485/57421 [00:00<00:02, 19993.43KB/s][A
     21%|██        | 11781/57421 [00:00<00:01, 30974.93KB/s][A
     36%|███▌      | 20721/57421 [00:00<00:00, 49815.17KB/s][A
     49%|████▉     | 28201/57421 [00:00<00:00, 57694.22KB/s][A
     61%|██████    | 34746/57421 [00:00<00:00, 60098.93KB/s][A
     76%|███████▌  | 43446/57421 [00:00<00:00, 68362.76KB/s][A
    100%|██████████| 57421/57421 [00:01<00:00, 50370.79KB/s]
    Finished, total runtime is 76.80 s
    { 'best_config': { 'dataset': 'auto',
                       'dataset_root': 'auto',
                       'estimator': <class 'gluoncv.auto.estimators.ssd.ssd.SSDEstimator'>,
                       'gpus': [0],
                       'horovod': False,
                       'num_workers': 8,
                       'resume': '',
                       'save_interval': 1,
                       'ssd': { 'amp': False,
                                'base_network': 'resnet50_v1',
                                'data_shape': 512,
                                'filters': None,
                                'nms_thresh': 0.45,
                                'nms_topk': 400,
                                'ratios': ( [1, 2, 0.5],
                                            [1, 2, 0.5, 3, 0.3333333333333333],
                                            [1, 2, 0.5, 3, 0.3333333333333333],
                                            [1, 2, 0.5, 3, 0.3333333333333333],
                                            [1, 2, 0.5],
                                            [1, 2, 0.5]),
                                'sizes': (30, 60, 111, 162, 213, 264, 315),
                                'steps': (8, 16, 32, 64, 100, 300),
                                'syncbn': False,
                                'transfer': 'ssd_512_resnet50_v1_coco'},
                       'train': { 'batch_size': 8,
                                  'dali': False,
                                  'early_stop_baseline': -inf,
                                  'early_stop_max_value': inf,
                                  'early_stop_min_delta': 0.001,
                                  'early_stop_patience': 10,
                                  'epochs': 5,
                                  'log_interval': 100,
                                  'lr': 0.001,
                                  'lr_decay': 0.1,
                                  'lr_decay_epoch': (160, 200),
                                  'momentum': 0.9,
                                  'seed': 304,
                                  'start_epoch': 0,
                                  'wd': 0.0005},
                       'valid': { 'batch_size': 8,
                                  'iou_thresh': 0.5,
                                  'metric': 'voc07',
                                  'val_interval': 1}},
      'total_time': 76.79892778396606,
      'train_map': 0.7013412996382214,
      'valid_map': 0.6834984532415778}


.. parsed-literal::
    :class: output

    <autogluon.vision.detector.detector.ObjectDetector at 0x7f2cdb6da940>


Note that ``num_trials=2`` above is only used to speed up the tutorial.
In normal practice, it is common to only use ``time_limit`` and drop
``num_trials``. Also note that hyperparameter tuning defaults to random
search.

After fitting, AutoGluon automatically returns the best model among all
models in the searching space. From the output, we know the best model
is the one trained with the second learning rate. To see how well the
returned model performed on test dataset, call detector.evaluate().

.. code:: python

    dataset_test = ObjectDetector.Dataset.from_voc(url, splits='test')
    
    test_map = detector.evaluate(dataset_test)
    print("mAP on test dataset: {}".format(test_map[1][-1]))


.. parsed-literal::
    :class: output

    tiny_motorbike/
    ├── Annotations/
    ├── ImageSets/
    └── JPEGImages/
    mAP on test dataset: 0.12157768403857265


Below, we randomly select an image from test dataset and show the
predicted class, box and probability over the origin image, stored in
``predict_class``, ``predict_rois`` and ``predict_score`` columns,
respectively. You can interpret ``predict_rois`` as a dict of (``xmin``,
``ymin``, ``xmax``, ``ymax``) proportional to original image size.

.. code:: python

    image_path = dataset_test.iloc[0]['image']
    result = detector.predict(image_path)
    print(result)


.. parsed-literal::
    :class: output

       predict_class  predict_score  \
    0      motorbike       0.964290   
    1         person       0.901450   
    2      motorbike       0.379375   
    3            car       0.224108   
    4         person       0.151031   
    ..           ...            ...   
    76        person       0.026454   
    77        person       0.026297   
    78        person       0.026240   
    79         chair       0.025993   
    80        person       0.025979   
    
                                             predict_rois  
    0   {'xmin': 0.32511788606643677, 'ymin': 0.426943...  
    1   {'xmin': 0.38163241744041443, 'ymin': 0.279039...  
    2   {'xmin': 0.0, 'ymin': 0.6350289583206177, 'xma...  
    3   {'xmin': 0.0, 'ymin': 0.6296865940093994, 'xma...  
    4   {'xmin': 0.03611136972904205, 'ymin': 0.0, 'xm...  
    ..                                                ...  
    76  {'xmin': 0.8196716904640198, 'ymin': 0.4491611...  
    77  {'xmin': 0.40028253197669983, 'ymin': 0.757062...  
    78  {'xmin': 0.9661840200424194, 'ymin': 0.2806696...  
    79  {'xmin': 0.11712463200092316, 'ymin': 0.011974...  
    80  {'xmin': 0.993757426738739, 'ymin': 0.08150030...  
    
    [81 rows x 3 columns]


Prediction with multiple images is permitted:

.. code:: python

    bulk_result = detector.predict(dataset_test)
    print(bulk_result)


.. parsed-literal::
    :class: output

         predict_class  predict_score  \
    0        motorbike       0.964290   
    1           person       0.901450   
    2        motorbike       0.379375   
    3              car       0.224108   
    4           person       0.151031   
    ...            ...            ...   
    3760     motorbike       0.017063   
    3761           car       0.017042   
    3762        person       0.016949   
    3763        person       0.016934   
    3764     motorbike       0.016925   
    
                                               predict_rois  \
    0     {'xmin': 0.32511788606643677, 'ymin': 0.426943...   
    1     {'xmin': 0.38163241744041443, 'ymin': 0.279039...   
    2     {'xmin': 0.0, 'ymin': 0.6350289583206177, 'xma...   
    3     {'xmin': 0.0, 'ymin': 0.6296865940093994, 'xma...   
    4     {'xmin': 0.03611136972904205, 'ymin': 0.0, 'xm...   
    ...                                                 ...   
    3760  {'xmin': 0.11219224333763123, 'ymin': 0.560805...   
    3761  {'xmin': 0.8976275324821472, 'ymin': 0.7462039...   
    3762  {'xmin': 0.3027859032154083, 'ymin': 0.4321423...   
    3763  {'xmin': 0.7102004289627075, 'ymin': 0.2931949...   
    3764  {'xmin': 0.7111496925354004, 'ymin': 0.8699753...   
    
                                                      image  
    0     /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    1     /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    2     /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    3     /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    4     /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    ...                                                 ...  
    3760  /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    3761  /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    3762  /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    3763  /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    3764  /home/ci/.gluoncv/datasets/tiny_motorbike/tiny...  
    
    [3765 rows x 4 columns]


We can also save the trained model, and use it later.

.. warning::

   ``ObjectDetector.load()`` used ``pickle`` module implicitly, which is
   known to be insecure. It is possible to construct malicious pickle
   data which will execute arbitrary code during unpickling. Never load
   data that could have come from an untrusted source, or that could
   have been tampered with. **Only load data you trust.**

.. code:: python

    savefile = 'detector.ag'
    detector.save(savefile)
    new_detector = ObjectDetector.load(savefile)


.. parsed-literal::
    :class: output

    /home/ci/opt/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:1784: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
    	data: None
      input_sym_arg_type = in_param.infer_type()[0]