.. _sec_object_detection_dataset:

Object Detection - Prepare Dataset for Object Detector
======================================================


Preparing dataset for object detection is slightly difference and more
difficult than image prediction.

Our goal in this tutorial is to introduce the simplest methods to
initiate or load a object detection datset for
``autogluon.vision.ObjectDetector``.

There are generally two ways to load a dataset for ObjectDetector:

-  Load an existing object detection dataset, in VOC or COCO formats,
   downloaded or exported by other labeling tools.

-  Manually convert raw annotations in any format, knowing this you will
   be able to deal with arbitrary dataset format.

.. code:: python

    %matplotlib inline
    import autogluon.core as ag
    from autogluon.vision import ObjectDetector


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-object-detection-v3/venv/lib/python3.7/site-packages/gluoncv/__init__.py:40: UserWarning: Both `mxnet==1.7.0` and `torch==1.9.0+cu102` are installed. You might encounter increased GPU memory footprint if both framework are used at the same time.
      warnings.warn(f'Both `mxnet=={mx.__version__}` and `torch=={torch.__version__}` are installed. '


Load an existing object detection dataset
-----------------------------------------

Pascal VOC and MS COCO are two most popular data format for object
detection. Most public available object detection datasets follow either
one of these two formats. In this tutorial we will not touch the
details. You may view the original introduction for
`VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`__ and
`COCO <https://cocodataset.org/#home>`__.

To distinguish these two formats, you can either refer to the labeling
tool or check the folder structure. Usually annotations in VOC format
are individual ``xml`` files, while COCO format use a single ``json``
file to store all annotations.

.. code:: python

    url = 'https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip'
    dataset_train = ObjectDetector.Dataset.from_voc(url, splits='trainval')
    # or load from coco format, skip as it's too big to download
    # dataset_train = ObjectDetector.Dataset.from_coco(annotation_json_file, root='/path/to/root')


.. parsed-literal::
    :class: output

    tiny_motorbike/
    ├── Annotations/
    ├── ImageSets/
    └── JPEGImages/


Manually convert any format to autogluon object detector dataset
----------------------------------------------------------------

We will walk you through by creating a dataset manually to help you
understand the meaning of underlying data, this does not mean you have
to do so. We highly recommend you to use a handy labeling tool for
object detection if you want to create one by your own. Labeling
bounding boxes are time consuming so a nice UI/UX design will
significantly reduce the trouble.

In the following section, we will use a single image and add annotations
manually for all three major objects.

.. code:: python

    ag.utils.download('https://raw.githubusercontent.com/zhreshold/mxnet-ssd/master/data/demo/dog.jpg', path='dog.jpg')
    import matplotlib.image as mpimg
    import matplotlib.pyplot as plt
    img = mpimg.imread('dog.jpg')
    imgplot = plt.imshow(img)
    plt.grid()
    plt.show()


.. parsed-literal::
    :class: output

    100%|██████████| 160/160 [00:00<00:00, 5536.99KB/s]


.. figure:: output_dataset_f0fc3d_5_1.png


With the grid on, we can roughly annotate this image like this:

.. code:: python

    import pandas as pd
    
    class NaiveDetectionGT:
        def __init__(self, image):
            self._objects = []
            self.image = image
            img = mpimg.imread('dog.jpg')
            self.w = img.shape[1]
            self.h = img.shape[0]
    
        def add_object(self, name, xmin, ymin, xmax, ymax, difficult=0):
            self._objects.append({'image': self.image, 'class': name,
                                  'xmin': xmin / self.w, 'ymin': ymin / self.h,
                                  'xmax': xmax / self.w, 'ymax': ymax / self.h, 'difficult': difficult})
    
        @property
        def df(self):
            return pd.DataFrame(self._objects)
    
    gt = NaiveDetectionGT('dog.jpg')
    gt.add_object('dog', 140, 220, 300, 540)
    gt.add_object('bicycle', 120, 140, 580, 420)
    gt.add_object('car', 460, 70, 680, 170)
    df = gt.df
    df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>image</th>
          <th>class</th>
          <th>xmin</th>
          <th>ymin</th>
          <th>xmax</th>
          <th>ymax</th>
          <th>difficult</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>dog.jpg</td>
          <td>dog</td>
          <td>0.182292</td>
          <td>0.381944</td>
          <td>0.390625</td>
          <td>0.937500</td>
          <td>0</td>
        </tr>
        <tr>
          <th>1</th>
          <td>dog.jpg</td>
          <td>bicycle</td>
          <td>0.156250</td>
          <td>0.243056</td>
          <td>0.755208</td>
          <td>0.729167</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2</th>
          <td>dog.jpg</td>
          <td>car</td>
          <td>0.598958</td>
          <td>0.121528</td>
          <td>0.885417</td>
          <td>0.295139</td>
          <td>0</td>
        </tr>
      </tbody>
    </table>
    </div>


The ``df`` is a valid dataset and can be used by ``ObjectDetector.fit``
function. Internally it will be converted to object detection dataset,
or you can manually convert it.

.. code:: python

    dataset = ObjectDetector.Dataset(df, classes=df['class'].unique().tolist())
    dataset.show_images(nsample=1, ncol=1)


.. figure:: output_dataset_f0fc3d_9_0.png


Congratulations, you can now proceed to
:ref:`sec_object_detection_quick` to start training the
``ObjectDetector``.