{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5c0fd970",
   "metadata": {},
   "source": [
    "# AutoMM Detection - Convert VOC Format Dataset to COCO Format\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/data_preparation/voc_to_coco.ipynb)\n",
    "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/data_preparation/voc_to_coco.ipynb)\n",
    "\n",
    "\n",
    "\n",
    "[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) is a collection of datasets for object detection. \n",
    "And VOC format refers to the specific format (in `.xml` file) the Pascal VOC dataset is using.\n",
    "\n",
    "In this tutorial, we will convert VOC2007 dataset from VOC format to COCO format. See [AutoMM Detection - Prepare Pascal VOC Dataset](prepare_voc.ipynb) for how to download it.\n",
    "We will use our tool `voc2coco`. This python script is in our code: \n",
    "[voc2coco.py](https://raw.githubusercontent.com/autogluon/autogluon/master/multimodal/src/autogluon/multimodal/cli/voc2coco.py),\n",
    "and you can also run it as a cli: `python3 -m autogluon.multimodal.cli.voc2coco`.\n",
    "\n",
    "**Note: In Autogluon MultiModalPredictor, we strongly recommend using COCO as your data format.** However, for fast proof testing we also have limit support for VOC format.\n",
    "\n",
    "## Convert Existing Splits\n",
    "\n",
    "Under VOC format root path, we have the following folders:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1bfc08d",
   "metadata": {},
   "source": [
    "```\n",
    "Annotations  ImageSets  JPEGImages\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a5f2954",
   "metadata": {},
   "source": [
    "And normally there are some pre-defined split files under `ImageSets/Main/`:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0695ea82",
   "metadata": {},
   "source": [
    "```\n",
    "train.txt\n",
    "val.txt\n",
    "test.txt\n",
    "...\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40693bf2",
   "metadata": {},
   "source": [
    "We can convert those splits into COCO format by simply running given the root directory, e.g. `./VOCdevkit/VOC2007`:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13cb6dee",
   "metadata": {},
   "source": [
    "```\n",
    "python3 -m autogluon.multimodal.cli.voc2coco --root_dir ./VOCdevkit/VOC2007\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a2974e5",
   "metadata": {},
   "source": [
    "The command line output will show the progress:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db2c3894",
   "metadata": {},
   "source": [
    "```\n",
    "Start converting !\n",
    " 17%|█████████████████▍                                                                                  | 841/4952 [00:00<00:00, 15571.88it/s\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2ea7c5c",
   "metadata": {},
   "source": [
    "Now those splits are converted to COCO format in `Annotations` folder under the root directory:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3100e56",
   "metadata": {},
   "source": [
    "```\n",
    "train_cocoformat.json\n",
    "val_cocoformat.json\n",
    "test_cocoformat.json\n",
    "...\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f0bf98e",
   "metadata": {},
   "source": [
    "## Convert Existing Splits\n",
    "\n",
    "Instead of using predefined splits, you can also split the data with the train/validation/test ratio you want.\n",
    "Note that this does not require any pre-existing split files. To split train/validation/test by 0.6/0.2/0.2, run:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb44551a",
   "metadata": {},
   "source": [
    "```\n",
    "python3 -m autogluon.multimodal.cli.voc2coco --root_dir ./VOCdevkit/VOC2007 --train_ratio 0.6 --val_ratio 0.2\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "532b2e4a",
   "metadata": {},
   "source": [
    "The command line output will show the progress:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bc17618",
   "metadata": {},
   "source": [
    "```\n",
    "Start converting !\n",
    " 17%|█████████████████▍                                                                                  | 841/4952 [00:00<00:00, 15571.88it/s\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e94db502",
   "metadata": {},
   "source": [
    "And this will generate user splited COCO format in `Annotations` folder under the root directory:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec50cf96",
   "metadata": {},
   "source": [
    "```\n",
    "usersplit_train_cocoformat.json\n",
    "usersplit_val_cocoformat.json\n",
    "usersplit_test_cocoformat.json\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbd8d1ac",
   "metadata": {},
   "source": [
    "## Other Examples\n",
    "\n",
    "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n",
    "\n",
    "## Customization\n",
    "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb)."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}