{ "cells": [ { "cell_type": "markdown", "id": "9687a1f2-0610-4131-804a-119f45150846", "metadata": {}, "source": [ "# AutoMM Detection - Object detection data formats\n", "\n", "In this section, we introduce the two major data formats that AutoMM Detection supports, which are COCO format and DataFrame format.\n", "\n", "## COCO Format\n", "See section [Convert Data to COCO Format](convert_data_to_coco_format.ipynb) for a detailed introduction on the COCO dataset format. \n", "Essentially you will need a `.json` file that holds data information for your dataset. \n", "For example, you could prepare your data in the following format:\n", "\n", "```python\n", "data = {\n", " # list of dictionaries containing all the category information\n", " \"categories\": [\n", " {\"supercategory\": \"none\", \"id\": 1, \"name\": \"person\"},\n", " {\"supercategory\": \"none\", \"id\": 2, \"name\": \"bicycle\"},\n", " {\"supercategory\": \"none\", \"id\": 3, \"name\": \"car\"},\n", " {\"supercategory\": \"none\", \"id\": 4, \"name\": \"motorcycle\"},\n", " # ...\n", " ],\n", "\n", " # list of dictionaries containing image info\n", " \"images\": [\n", " {\n", " \"file_name\": \".\",\n", " \"height\": 427,\n", " \"width\": 640,\n", " \"id\": 1\n", " },\n", " {\n", " \"file_name\": \".\",\n", " \"height\": 427,\n", " \"width\": 640,\n", " \"id\": 2\n", " },\n", " # ...\n", " ],\n", " # list of dictionaries containing bounding box annotation info\n", " \"annotations\": [\n", " {\n", " 'area': 33453, # area of the bounding box\n", " 'iscrowd': 0, # if the bounding box contains multiple objects, usually this is 0 since we are dealing with single box -> single object \n", " 'bbox': [181, 133, 177, 189], # the [x, y, width, height] format annotation of bounding box\n", " 'category_id': 8, # the \"id\" field of the corresponding category, not the \"name\" field\n", " 'ignore': 0, # set to 1 to ignore this annotation\n", " 'segmentation': [], # always empty since this tutorial is not for segmentation\n", " 'image_id': 1617, # the \"id\" field of the corresponding image\n", " 'id': 1 # the \"id\" of this particular annotation\n", " },\n", " {\n", " 'area': 25740, \n", " 'iscrowd': 0,\n", " 'bbox': [192, 100, 156, 165],\n", " 'category_id': 9,\n", " 'ignore': 0,\n", " 'segmentation': [],\n", " 'image_id': 1617,\n", " 'id': 2\n", " },\n", " # ...\n", " ],\n", " \n", " \"type\": \"instances\"\n", "}\n", "```\n", "\n", "\n", "## `pd.DataFrame` Format\n", "The AutoMM detection also supports the `pd.DataFrame` format. Your `pd.DataFrame` should contain 3 columns. \n", "\n", "- `image`: the path to the image file\n", "- `rois`: a list of arrays containing bounding box annotation `[x1, y1, x2, y2, class_label]`\n", "- `label`: a copy column of `rois`\n", "\n", "An example can be seen below:\n", "```\n", " image \\\n", "0 /home/ubuntu/autogluon-dev/docs/tutorials/mult... \n", "1 /home/ubuntu/autogluon-dev/docs/tutorials/mult... \n", "2 /home/ubuntu/autogluon-dev/docs/tutorials/mult... \n", "3 /home/ubuntu/autogluon-dev/docs/tutorials/mult... \n", "4 /home/ubuntu/autogluon-dev/docs/tutorials/mult... \n", "\n", " rois \\\n", "0 [[352.0, 138.0, 374.0, 373.0, 7], [105.0, 1.0,... \n", "1 [[40.0, 71.0, 331.0, 332.0, 7], [33.0, 42.0, 3... \n", "2 [[52.0, 22.0, 306.0, 326.0, 8], [26.0, 108.0, ... \n", "3 [[114.0, 154.0, 367.0, 346.0, 7], [292.0, 49.0... \n", "4 [[279.0, 225.0, 374.0, 338.0, 3], [245.0, 230.... \n", "\n", " label \n", "0 [[352.0, 138.0, 374.0, 373.0, 7], [105.0, 1.0,... \n", "1 [[40.0, 71.0, 331.0, 332.0, 7], [33.0, 42.0, 3... \n", "2 [[52.0, 22.0, 306.0, 326.0, 8], [26.0, 108.0, ... \n", "3 [[114.0, 154.0, 367.0, 346.0, 7], [292.0, 49.0... \n", "4 [[279.0, 225.0, 374.0, 338.0, 3], [245.0, 230.... \n", "```\n", "\n", "## Using the data formats to train and evaluate models\n", "\n", "### Download data\n", "We have the sample dataset ready in the cloud. Let's download it:" ] }, { "cell_type": "code", "execution_count": null, "id": "aa00faab-252f-44c9-b8f7-57131aa8251c", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "!pip install autogluon.multimodal\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b7eccabb-ccc5-423c-8650-5de3fdeba460", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from autogluon.core.utils.loaders import load_zip\n", "\n", "zip_file = \"https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip\"\n", "download_dir = \"./tiny_motorbike_coco\"\n", "\n", "load_zip.unzip(zip_file, unzip_dir=download_dir)\n", "data_dir = os.path.join(download_dir, \"tiny_motorbike\")\n", "train_path = os.path.join(data_dir, \"Annotations\", \"trainval_cocoformat.json\")\n", "test_path = os.path.join(data_dir, \"Annotations\", \"test_cocoformat.json\")" ] }, { "cell_type": "markdown", "id": "400d3fe1-d67c-495e-a410-b6837a32a54b", "metadata": {}, "source": [ "We provide useful util functions to convert from COCO format to `pd.DataFrame` format and vice versa.\n", "\n", "### From COCO format to `pd.DataFrame`\n", "Now we first introduce converting from COCO to `pd.DataFrame`" ] }, { "cell_type": "code", "execution_count": null, "id": "79d47c7e-682c-40c1-83aa-ee6412ed00ba", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal.utils.object_detection import from_coco\n", "train_df = from_coco(train_path)\n", "print(train_df)" ] }, { "cell_type": "markdown", "id": "a7841226-1b87-400d-b6f0-02530c64929b", "metadata": {}, "source": [ "### From `pd.DataFrame` to COCO format" ] }, { "cell_type": "code", "execution_count": null, "id": "c9ff8957-0f72-4b5f-83ae-a6e9be2746f5", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal.utils.object_detection import object_detection_df_to_coco\n", "\n", "train_coco = object_detection_df_to_coco(train_df)\n", "print(train_coco)" ] }, { "cell_type": "markdown", "id": "a263af27-a204-4415-95a4-a6d953e77e58", "metadata": {}, "source": [ "You can save the `train_coco`, which is a dictionary, to a `.json` file by specifying the `save_path` when calling `object_detection_df_to_coco`." ] }, { "cell_type": "code", "execution_count": null, "id": "9e0d73d1-9e97-4af8-b415-f87cd8c2b6e7", "metadata": {}, "outputs": [], "source": [ "train_coco = object_detection_df_to_coco(train_df, save_path=\"./df_converted_to_coco.json\")" ] }, { "cell_type": "markdown", "id": "c52a6054-cfc1-4e1f-accd-66fed756855f", "metadata": {}, "source": [ "The next time when loading from the `.json` file by calling `from_coco`, make sure to supply the right `root` such that `/` is a valid image path.\n", "(Note: `file_name` is under the `\"images\"` subfield in `data` defined at the beginning of this tutorial.) For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "ecf6d0b4-5595-4093-a53e-c912deecdf53", "metadata": {}, "outputs": [], "source": [ "train_df_from_saved_coco = from_coco(\"./df_converted_to_coco.json\", root=\"./\")" ] }, { "cell_type": "markdown", "id": "0232e107-cf87-4a55-9608-b9696785ff51", "metadata": {}, "source": [ "### Training with `pd.DataFrame` format\n", "\n", "To start, let's import MultiModalPredictor:" ] }, { "cell_type": "code", "execution_count": null, "id": "b7c208c1-fa35-4203-85e8-c007deef208c", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import MultiModalPredictor" ] }, { "attachments": {}, "cell_type": "markdown", "id": "97ed55c0-c052-4e42-80ab-a75e1c73be96", "metadata": {}, "source": [ "Make sure `mmcv` and `mmdet` are installed:" ] }, { "cell_type": "code", "execution_count": null, "id": "19e48f7e-a30d-449b-b8b4-e2c612e4a8e4", "metadata": {}, "outputs": [], "source": [ "!mim install mmcv\n", "!pip install \"mmdet==3.1.0\"" ] }, { "cell_type": "markdown", "id": "798372e3-9489-45eb-8956-fffca6458659", "metadata": {}, "source": [ "Again, we follow the model setup as in [AutoMM Detection - Quick Start on a Tiny COCO Format Dataset](../quick_start/quick_start_coco.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "390c1311-deed-47f4-b014-b9188bb68165", "metadata": {}, "outputs": [], "source": [ "checkpoint_name = \"yolov3_mobilenetv2_320_300e_coco\"\n", "num_gpus = -1 # use all GPUs\n", "\n", "import uuid\n", "\n", "model_path = f\"./tmp/{uuid.uuid4().hex}-df_train_temp_save\"\n", "predictor_df = MultiModalPredictor(\n", " hyperparameters={\n", " \"model.mmdet_image.checkpoint_name\": checkpoint_name,\n", " \"env.num_gpus\": num_gpus,\n", " },\n", " problem_type=\"object_detection\",\n", " sample_data_path=train_df, # we specify train_df here as the sample_data_path in order to get the num_classes\n", " path=model_path,\n", ")\n", "\n", "predictor_df.fit(\n", " train_df,\n", " hyperparameters={\n", " \"optimization.learning_rate\": 2e-4, # we use two stage and detection head has 100x lr\n", " \"optimization.max_epochs\": 30,\n", " \"env.per_gpu_batch_size\": 32, # decrease it when model is large\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "e956bbbc-c43c-43b7-84f9-e51593d0eccf", "metadata": {}, "source": [ "### Evaluation with `pd.DataFrame` format\n", "We follow the evaluation setup as in :ref:`sec_automm_detection_quick_start_coco`. We encourage you to check it out for further details. \n", "\n", "To evaluate the model with `pd.DataFrame` format, run following code." ] }, { "cell_type": "code", "execution_count": null, "id": "43838766-dcad-4f83-940a-afd9b5447fb7", "metadata": {}, "outputs": [], "source": [ "test_df = from_coco(test_path)\n", "predictor_df.evaluate(test_df)" ] }, { "cell_type": "markdown", "id": "ad78bf52-7e16-441f-aacd-e4565b11f46d", "metadata": {}, "source": [ "### Other Examples\n", "\n", "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM." ] }, { "cell_type": "markdown", "id": "96e0bb8b-2f72-4413-8758-39033ac4278d", "metadata": {}, "source": [ "## Customization\n", "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb)." ] }, { "cell_type": "markdown", "id": "b659da81-ee63-4ebf-a634-291f6a7a4b18", "metadata": {}, "source": [ "### Citation\n", "```\n", "@misc{redmon2018yolov3,\n", " title={YOLOv3: An Incremental Improvement},\n", " author={Joseph Redmon and Ali Farhadi},\n", " year={2018},\n", " eprint={1804.02767},\n", " archivePrefix={arXiv},\n", " primaryClass={cs.CV}\n", "}\n", "```\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }