{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9687a1f2-0610-4131-804a-119f45150846",
   "metadata": {},
   "source": [
    "# AutoMM Detection - Object detection data formats\n",
    "\n",
    "In this section, we introduce the two major data formats that AutoMM Detection supports, which are COCO format and DataFrame format.\n",
    "\n",
    "## COCO Format\n",
    "See section [Convert Data to COCO Format](convert_data_to_coco_format.ipynb) for a detailed introduction on the COCO dataset format. \n",
    "Essentially you will need a `.json` file that holds data information for your dataset. \n",
    "For example, you could prepare your data in the following format:\n",
    "\n",
    "```python\n",
    "data = {\n",
    "    # list of dictionaries containing all the category information\n",
    "    \"categories\": [\n",
    "        {\"supercategory\": \"none\", \"id\": 1, \"name\": \"person\"},\n",
    "        {\"supercategory\": \"none\", \"id\": 2, \"name\": \"bicycle\"},\n",
    "        {\"supercategory\": \"none\", \"id\": 3, \"name\": \"car\"},\n",
    "        {\"supercategory\": \"none\", \"id\": 4, \"name\": \"motorcycle\"},\n",
    "        # ...\n",
    "    ],\n",
    "\n",
    "    # list of dictionaries containing image info\n",
    "    \"images\": [\n",
    "        {\n",
    "            \"file_name\": \"<imagename0>.<ext>\",\n",
    "            \"height\": 427,\n",
    "            \"width\": 640,\n",
    "            \"id\": 1\n",
    "        },\n",
    "        {\n",
    "            \"file_name\": \"<imagename2>.<ext>\",\n",
    "            \"height\": 427,\n",
    "            \"width\": 640,\n",
    "            \"id\": 2\n",
    "        },\n",
    "        # ...\n",
    "    ],\n",
    "    # list of dictionaries containing bounding box annotation info\n",
    "    \"annotations\": [\n",
    "        {\n",
    "            'area': 33453,  # area of the bounding box\n",
    "            'iscrowd': 0,  # if the bounding box contains multiple objects, usually this is 0 since we are dealing with single box -> single object \n",
    "            'bbox': [181, 133, 177, 189],  # the [x, y, width, height] format annotation of bounding box\n",
    "            'category_id': 8,  # the \"id\" field of the corresponding category, not the \"name\" field\n",
    "            'ignore': 0,  # set to 1 to ignore this annotation\n",
    "            'segmentation': [],  # always empty since this tutorial is not for segmentation\n",
    "            'image_id': 1617,  # the \"id\" field of the corresponding image\n",
    "            'id': 1  # the \"id\" of this particular annotation\n",
    "        },\n",
    "        {\n",
    "            'area': 25740, \n",
    "            'iscrowd': 0,\n",
    "            'bbox': [192, 100, 156, 165],\n",
    "            'category_id': 9,\n",
    "            'ignore': 0,\n",
    "            'segmentation': [],\n",
    "            'image_id': 1617,\n",
    "            'id': 2\n",
    "        },\n",
    "        # ...\n",
    "    ],\n",
    "    \n",
    "    \"type\": \"instances\"\n",
    "}\n",
    "```\n",
    "\n",
    "\n",
    "## `pd.DataFrame` Format\n",
    "The AutoMM detection also supports the `pd.DataFrame` format. Your `pd.DataFrame` should contain 3 columns. \n",
    "\n",
    "- `image`: the path to the image file\n",
    "- `rois`: a list of arrays containing bounding box annotation `[x1, y1, x2, y2, class_label]`\n",
    "- `label`: a copy column of `rois`\n",
    "\n",
    "An example can be seen below:\n",
    "```\n",
    "                                               image  \\\n",
    "0  /home/ubuntu/autogluon-dev/docs/tutorials/mult...   \n",
    "1  /home/ubuntu/autogluon-dev/docs/tutorials/mult...   \n",
    "2  /home/ubuntu/autogluon-dev/docs/tutorials/mult...   \n",
    "3  /home/ubuntu/autogluon-dev/docs/tutorials/mult...   \n",
    "4  /home/ubuntu/autogluon-dev/docs/tutorials/mult...   \n",
    "\n",
    "                                                rois  \\\n",
    "0  [[352.0, 138.0, 374.0, 373.0, 7], [105.0, 1.0,...   \n",
    "1  [[40.0, 71.0, 331.0, 332.0, 7], [33.0, 42.0, 3...   \n",
    "2  [[52.0, 22.0, 306.0, 326.0, 8], [26.0, 108.0, ...   \n",
    "3  [[114.0, 154.0, 367.0, 346.0, 7], [292.0, 49.0...   \n",
    "4  [[279.0, 225.0, 374.0, 338.0, 3], [245.0, 230....   \n",
    "\n",
    "                                               label  \n",
    "0  [[352.0, 138.0, 374.0, 373.0, 7], [105.0, 1.0,...  \n",
    "1  [[40.0, 71.0, 331.0, 332.0, 7], [33.0, 42.0, 3...  \n",
    "2  [[52.0, 22.0, 306.0, 326.0, 8], [26.0, 108.0, ...  \n",
    "3  [[114.0, 154.0, 367.0, 346.0, 7], [292.0, 49.0...  \n",
    "4  [[279.0, 225.0, 374.0, 338.0, 3], [245.0, 230....  \n",
    "```\n",
    "\n",
    "## Using the data formats to train and evaluate models\n",
    "\n",
    "### Download data\n",
    "We have the sample dataset ready in the cloud. Let's download it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa00faab-252f-44c9-b8f7-57131aa8251c",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "!pip install autogluon.multimodal\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7eccabb-ccc5-423c-8650-5de3fdeba460",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from autogluon.core.utils.loaders import load_zip\n",
    "\n",
    "zip_file = \"https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip\"\n",
    "download_dir = \"./tiny_motorbike_coco\"\n",
    "\n",
    "load_zip.unzip(zip_file, unzip_dir=download_dir)\n",
    "data_dir = os.path.join(download_dir, \"tiny_motorbike\")\n",
    "train_path = os.path.join(data_dir, \"Annotations\", \"trainval_cocoformat.json\")\n",
    "test_path = os.path.join(data_dir, \"Annotations\", \"test_cocoformat.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "400d3fe1-d67c-495e-a410-b6837a32a54b",
   "metadata": {},
   "source": [
    "We provide useful util functions to convert from COCO format to `pd.DataFrame` format and vice versa.\n",
    "\n",
    "### From COCO format to `pd.DataFrame`\n",
    "Now we first introduce converting from COCO to `pd.DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79d47c7e-682c-40c1-83aa-ee6412ed00ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.multimodal.utils.object_detection import from_coco\n",
    "train_df = from_coco(train_path)\n",
    "print(train_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7841226-1b87-400d-b6f0-02530c64929b",
   "metadata": {},
   "source": [
    "### From `pd.DataFrame` to COCO format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9ff8957-0f72-4b5f-83ae-a6e9be2746f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.multimodal.utils.object_detection import object_detection_df_to_coco\n",
    "\n",
    "train_coco = object_detection_df_to_coco(train_df)\n",
    "print(train_coco)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a263af27-a204-4415-95a4-a6d953e77e58",
   "metadata": {},
   "source": [
    "You can save the `train_coco`, which is a dictionary, to a `.json` file by specifying the `save_path` when calling `object_detection_df_to_coco`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e0d73d1-9e97-4af8-b415-f87cd8c2b6e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_coco = object_detection_df_to_coco(train_df, save_path=\"./df_converted_to_coco.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c52a6054-cfc1-4e1f-accd-66fed756855f",
   "metadata": {},
   "source": [
    "The next time when loading from the `.json` file by calling `from_coco`, make sure to supply the right `root` such that `<root>/<file_name>` is a valid image path.\n",
    "(Note: `file_name` is under the `\"images\"` subfield in `data` defined at the beginning of this tutorial.) For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecf6d0b4-5595-4093-a53e-c912deecdf53",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_df_from_saved_coco = from_coco(\"./df_converted_to_coco.json\", root=\"./\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0232e107-cf87-4a55-9608-b9696785ff51",
   "metadata": {},
   "source": [
    "### Training with `pd.DataFrame` format\n",
    "\n",
    "To start, let's import MultiModalPredictor:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7c208c1-fa35-4203-85e8-c007deef208c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.multimodal import MultiModalPredictor"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "97ed55c0-c052-4e42-80ab-a75e1c73be96",
   "metadata": {},
   "source": [
    "Make sure `mmcv` and `mmdet` are installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19e48f7e-a30d-449b-b8b4-e2c612e4a8e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mim install mmcv\n",
    "!pip install \"mmdet==3.1.0\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "798372e3-9489-45eb-8956-fffca6458659",
   "metadata": {},
   "source": [
    "Again, we follow the model setup as in [AutoMM Detection - Quick Start on a Tiny COCO Format Dataset](../quick_start/quick_start_coco.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "390c1311-deed-47f4-b014-b9188bb68165",
   "metadata": {},
   "outputs": [],
   "source": [
    "checkpoint_name = \"yolov3_mobilenetv2_320_300e_coco\"\n",
    "num_gpus = -1  # use all GPUs\n",
    "\n",
    "import uuid\n",
    "\n",
    "model_path = f\"./tmp/{uuid.uuid4().hex}-df_train_temp_save\"\n",
    "predictor_df = MultiModalPredictor(\n",
    "    hyperparameters={\n",
    "        \"model.mmdet_image.checkpoint_name\": checkpoint_name,\n",
    "        \"env.num_gpus\": num_gpus,\n",
    "    },\n",
    "    problem_type=\"object_detection\",\n",
    "    sample_data_path=train_df,  # we specify train_df here as the sample_data_path in order to get the num_classes\n",
    "    path=model_path,\n",
    ")\n",
    "\n",
    "predictor_df.fit(\n",
    "    train_df,\n",
    "    hyperparameters={\n",
    "        \"optimization.learning_rate\": 2e-4, # we use two stage and detection head has 100x lr\n",
    "        \"optimization.max_epochs\": 30,\n",
    "        \"env.per_gpu_batch_size\": 32,  # decrease it when model is large\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e956bbbc-c43c-43b7-84f9-e51593d0eccf",
   "metadata": {},
   "source": [
    "### Evaluation with `pd.DataFrame` format\n",
    "We follow the evaluation setup as in :ref:`sec_automm_detection_quick_start_coco`. We encourage you to check it out for further details.   \n",
    "\n",
    "To evaluate the model with `pd.DataFrame` format, run following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43838766-dcad-4f83-940a-afd9b5447fb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_df = from_coco(test_path)\n",
    "predictor_df.evaluate(test_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad78bf52-7e16-441f-aacd-e4565b11f46d",
   "metadata": {},
   "source": [
    "### Other Examples\n",
    "\n",
    "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96e0bb8b-2f72-4413-8758-39033ac4278d",
   "metadata": {},
   "source": [
    "## Customization\n",
    "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b659da81-ee63-4ebf-a634-291f6a7a4b18",
   "metadata": {},
   "source": [
    "### Citation\n",
    "```\n",
    "@misc{redmon2018yolov3,\n",
    "    title={YOLOv3: An Incremental Improvement},\n",
    "    author={Joseph Redmon and Ali Farhadi},\n",
    "    year={2018},\n",
    "    eprint={1804.02767},\n",
    "    archivePrefix={arXiv},\n",
    "    primaryClass={cs.CV}\n",
    "}\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}