{ "cells": [ { "cell_type": "markdown", "id": "5c0fd970", "metadata": {}, "source": [ "# AutoMM Detection - Convert VOC Format Dataset to COCO Format\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/data_preparation/voc_to_coco.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/data_preparation/voc_to_coco.ipynb)\n", "\n", "\n", "\n", "[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) is a collection of datasets for object detection. \n", "And VOC format refers to the specific format (in `.xml` file) the Pascal VOC dataset is using.\n", "\n", "In this tutorial, we will convert VOC2007 dataset from VOC format to COCO format. See [AutoMM Detection - Prepare Pascal VOC Dataset](prepare_voc.ipynb) for how to download it.\n", "We will use our tool `voc2coco`. This python script is in our code: \n", "[voc2coco.py](https://raw.githubusercontent.com/autogluon/autogluon/master/multimodal/src/autogluon/multimodal/cli/voc2coco.py),\n", "and you can also run it as a cli: `python3 -m autogluon.multimodal.cli.voc2coco`.\n", "\n", "**Note: In Autogluon MultiModalPredictor, we strongly recommend using COCO as your data format.** However, for fast proof testing we also have limit support for VOC format.\n", "\n", "## Convert Existing Splits\n", "\n", "Under VOC format root path, we have the following folders:" ] }, { "cell_type": "markdown", "id": "c1bfc08d", "metadata": {}, "source": [ "```\n", "Annotations ImageSets JPEGImages\n", "```\n" ] }, { "cell_type": "markdown", "id": "6a5f2954", "metadata": {}, "source": [ "And normally there are some pre-defined split files under `ImageSets/Main/`:" ] }, { "cell_type": "markdown", "id": "0695ea82", "metadata": {}, "source": [ "```\n", "train.txt\n", "val.txt\n", "test.txt\n", "...\n", "```\n" ] }, { "cell_type": "markdown", "id": "40693bf2", "metadata": {}, "source": [ "We can convert those splits into COCO format by simply running given the root directory, e.g. `./VOCdevkit/VOC2007`:" ] }, { "cell_type": "markdown", "id": "13cb6dee", "metadata": {}, "source": [ "```\n", "python3 -m autogluon.multimodal.cli.voc2coco --root_dir ./VOCdevkit/VOC2007\n", "```\n" ] }, { "cell_type": "markdown", "id": "9a2974e5", "metadata": {}, "source": [ "The command line output will show the progress:" ] }, { "cell_type": "markdown", "id": "db2c3894", "metadata": {}, "source": [ "```\n", "Start converting !\n", " 17%|█████████████████▍ | 841/4952 [00:00<00:00, 15571.88it/s\n", "```\n" ] }, { "cell_type": "markdown", "id": "c2ea7c5c", "metadata": {}, "source": [ "Now those splits are converted to COCO format in `Annotations` folder under the root directory:" ] }, { "cell_type": "markdown", "id": "d3100e56", "metadata": {}, "source": [ "```\n", "train_cocoformat.json\n", "val_cocoformat.json\n", "test_cocoformat.json\n", "...\n", "```\n" ] }, { "cell_type": "markdown", "id": "5f0bf98e", "metadata": {}, "source": [ "## Convert Existing Splits\n", "\n", "Instead of using predefined splits, you can also split the data with the train/validation/test ratio you want.\n", "Note that this does not require any pre-existing split files. To split train/validation/test by 0.6/0.2/0.2, run:" ] }, { "cell_type": "markdown", "id": "eb44551a", "metadata": {}, "source": [ "```\n", "python3 -m autogluon.multimodal.cli.voc2coco --root_dir ./VOCdevkit/VOC2007 --train_ratio 0.6 --val_ratio 0.2\n", "```\n" ] }, { "cell_type": "markdown", "id": "532b2e4a", "metadata": {}, "source": [ "The command line output will show the progress:" ] }, { "cell_type": "markdown", "id": "9bc17618", "metadata": {}, "source": [ "```\n", "Start converting !\n", " 17%|█████████████████▍ | 841/4952 [00:00<00:00, 15571.88it/s\n", "```\n" ] }, { "cell_type": "markdown", "id": "e94db502", "metadata": {}, "source": [ "And this will generate user splited COCO format in `Annotations` folder under the root directory:" ] }, { "cell_type": "markdown", "id": "ec50cf96", "metadata": {}, "source": [ "```\n", "usersplit_train_cocoformat.json\n", "usersplit_val_cocoformat.json\n", "usersplit_test_cocoformat.json\n", "```\n" ] }, { "cell_type": "markdown", "id": "fbd8d1ac", "metadata": {}, "source": [ "## Other Examples\n", "\n", "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n", "\n", "## Customization\n", "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb)." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }