{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "83628e0a",
   "metadata": {},
   "source": [
    "# AutoMM Detection - Finetune on COCO Format Dataset with Customized Settings\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/object_detection/finetune_coco.ipynb)\n",
    "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/object_detection/finetune_coco.ipynb)\n",
    "\n",
    "\n",
    "\n",
    "![Pothole Dataset](https://automl-mm-bench.s3.amazonaws.com/object_detection/example_image/pothole144_gt.jpg)\n",
    "\n",
    "\n",
    "In this section, our goal is to fast finetune and evaluate a pretrained model \n",
    "on [Pothole dataset](https://www.kaggle.com/datasets/andrewmvd/pothole-detection) in COCO format with customized setting.\n",
    "Pothole is a single object, i.e. `pothole`, detection dataset, containing 665 images with bounding box annotations\n",
    "for the creation of detection models and can work as POC/POV for the maintenance of roads.\n",
    "See [AutoMM Detection - Prepare Pothole Dataset](../data_preparation/prepare_pothole.ipynb) for how to prepare Pothole dataset.\n",
    "\n",
    "To start, let's import MultiModalPredictor and make sure `mmcv` and `mmdet` are installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa00faab-252f-44c9-b8f7-57131aa8251c",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "!pip install autogluon.multimodal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9130df8",
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "#!pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0  # To use object detection, downgrade the torch version if it's >=2.2\n",
    "!mim install \"mmcv==2.1.0\"  # For Google Colab, use the line below instead to install mmcv\n",
    "#!pip install \"mmcv==2.1.0\" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html\n",
    "!pip install \"mmdet==3.2.0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fc19708",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.multimodal import MultiModalPredictor"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcae4d09",
   "metadata": {},
   "source": [
    "And also import some other packages that will be used in this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83311ccd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from autogluon.core.utils.loaders import load_zip"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "00b5f2d0",
   "metadata": {},
   "source": [
    "We have the sample dataset ready in the cloud. Let's download it and store the paths for each data split:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd2766cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "zip_file = \"https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip\"\n",
    "download_dir = \"./pothole\"\n",
    "\n",
    "load_zip.unzip(zip_file, unzip_dir=download_dir)\n",
    "data_dir = os.path.join(download_dir, \"pothole\")\n",
    "train_path = os.path.join(data_dir, \"Annotations\", \"usersplit_train_cocoformat.json\")\n",
    "val_path = os.path.join(data_dir, \"Annotations\", \"usersplit_val_cocoformat.json\")\n",
    "test_path = os.path.join(data_dir, \"Annotations\", \"usersplit_test_cocoformat.json\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a7e6868f",
   "metadata": {},
   "source": [
    "While using COCO format dataset, the input is the json annotation file of the dataset split.\n",
    "In this example, `usersplit_train_cocoformat.json` is the annotation file of the train split.\n",
    "`usersplit_val_cocoformat.json` is the annotation file of the validation split.\n",
    "And `usersplit_test_cocoformat.json` is the annotation file of the test split.\n",
    "\n",
    "We select the YOLOX-small model pretrained on COCO dataset. With this setting, it is fast to finetune or inference,\n",
    "and easy to deploy. Note that you can use a larger model by setting the `checkpoint_name` to corresponding checkpoint name for better performance (but usually with slower speed).\n",
    "And you may need to change the learning_rate and per_gpu_batch_size for a different model.\n",
    "An easier way is to use our predefined presets `\"medium_quality\"`, `\"high_quality\"`, or `\"best_quality\"`.\n",
    "For more about using presets, see [Quick Start Coco](../quick_start/quick_start_coco).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45a454a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "checkpoint_name = \"yolox_s\"\n",
    "num_gpus = 1  # only use one GPU"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb6e4c72",
   "metadata": {},
   "source": [
    "We create the MultiModalPredictor with selected checkpoint name and number of GPUs.\n",
    "We need to specify the problem_type to `\"object_detection\"`,\n",
    "and also provide a `sample_data_path` for the predictor to infer the categories of the dataset.\n",
    "Here we provide the `train_path`, and it also works using any other split of this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b94ac7e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = MultiModalPredictor(\n",
    "    hyperparameters={\n",
    "        \"model.mmdet_image.checkpoint_name\": checkpoint_name,\n",
    "        \"env.num_gpus\": num_gpus,\n",
    "    },\n",
    "    problem_type=\"object_detection\",\n",
    "    sample_data_path=train_path,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d5dfac9",
   "metadata": {},
   "source": [
    "We set the learning rate to be `1e-4`.\n",
    "Note that we use a two-stage learning rate option during finetuning by default,\n",
    "and the model head will have 100x learning rate.\n",
    "Using a two-stage learning rate with high learning rate only on head layers makes\n",
    "the model converge faster during finetuning. It usually gives better performance as well,\n",
    "especially on small datasets with hundreds or thousands of images.\n",
    "We set batch size to be 16, and you can increase or decrease the batch size based on your available GPU memory.\n",
    "We set max number of epochs to 30, number of validation check per interval to 1.0,\n",
    "and validation check per n epochs to 3 for fast finetuning.\n",
    "We also compute the time of the fit process here for better understanding the speed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c1607b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.fit(\n",
    "    train_path,\n",
    "    tuning_data=val_path,\n",
    "    hyperparameters={\n",
    "        \"optimization.learning_rate\": 1e-4,  # we use two stage and detection head has 100x lr\n",
    "        \"env.per_gpu_batch_size\": 16,  # decrease it when model is large or GPU memory is small\n",
    "        \"optimization.max_epochs\": 30,  # max number of training epochs, note that we may early stop before this based on validation setting\n",
    "        \"optimization.val_check_interval\": 1.0,  # Do 1 validation each epoch\n",
    "        \"optimization.check_val_every_n_epoch\": 3,  # Do 1 validation each 3 epochs\n",
    "        \"optimization.patience\": 3,  # Early stop after 3 consective validations are not the best\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a91aec35",
   "metadata": {},
   "source": [
    "To evaluate the model we just trained, run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51a4c4f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.evaluate(test_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2acf0231",
   "metadata": {},
   "source": [
    "Note that it's always recommended to use our predefined presets to save customization time with following code script:\n",
    "\n",
    "```python\n",
    "predictor = MultiModalPredictor(\n",
    "    problem_type=\"object_detection\",\n",
    "    sample_data_path=train_path,\n",
    "    presets=\"medium_quality\",\n",
    ")\n",
    "predictor.fit(train_path, tuning_data=val_path)\n",
    "predictor.evaluate(test_path)\n",
    "```\n",
    "\n",
    "For more about using presets, see [Quick Start Coco](../quick_start/quick_start_coco).\n",
    "\n",
    "\n",
    "And the evaluation results are shown in command line output. \n",
    "The first value is mAP in COCO standard, and the second one is mAP in VOC standard (or mAP50). \n",
    "For more details about these metrics, see [COCO's evaluation guideline](https://cocodataset.org/#detection-eval).\n",
    "\n",
    "We can get the prediction on test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f83f7cb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "pred = predictor.predict(test_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "226dd2c2",
   "metadata": {},
   "source": [
    "Let's also visualize the prediction result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8f8c8d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install opencv-python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abe88f8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.multimodal.utils import visualize_detection\n",
    "conf_threshold = 0.25  # Specify a confidence threshold to filter out unwanted boxes\n",
    "visualization_result_dir = \"./\"  # Use the pwd as result dir to save the visualized image\n",
    "visualized = visualize_detection(\n",
    "    pred=pred[12:13],\n",
    "    detection_classes=predictor.classes,\n",
    "    conf_threshold=conf_threshold,\n",
    "    visualization_result_dir=visualization_result_dir,\n",
    ")\n",
    "from PIL import Image\n",
    "from IPython.display import display\n",
    "img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')\n",
    "display(img)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e7d78e",
   "metadata": {},
   "source": [
    "Under this fast finetune setting, we reached a good mAP number on a new dataset with a few hundred seconds!\n",
    "For how to finetune with higher performance,\n",
    "see [AutoMM Detection - High Performance Finetune on COCO Format Dataset](../finetune/detection_high_performance_finetune_coco.ipynb), where we finetuned a VFNet model with \n",
    "5 hours and reached `mAP = 0.450, mAP50 = 0.718` on this dataset.\n",
    "\n",
    "## Other Examples\n",
    "\n",
    "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n",
    "\n",
    "## Customization\n",
    "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb).\n",
    "\n",
    "## Citation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8feba8cd",
   "metadata": {},
   "source": [
    "```\n",
    "@article{DBLP:journals/corr/abs-2107-08430,\n",
    "  author    = {Zheng Ge and\n",
    "               Songtao Liu and\n",
    "               Feng Wang and\n",
    "               Zeming Li and\n",
    "               Jian Sun},\n",
    "  title     = {{YOLOX:} Exceeding {YOLO} Series in 2021},\n",
    "  journal   = {CoRR},\n",
    "  volume    = {abs/2107.08430},\n",
    "  year      = {2021},\n",
    "  url       = {https://arxiv.org/abs/2107.08430},\n",
    "  eprinttype = {arXiv},\n",
    "  eprint    = {2107.08430},\n",
    "  timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},\n",
    "  biburl    = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},\n",
    "  bibsource = {dblp computer science bibliography, https://dblp.org},\n",
    "}\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}