{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "84a147df", "metadata": {}, "source": [ "# AutoMM Detection - Quick Start on a Tiny COCO Format Dataset\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/object_detection/quick_start/quick_start_coco.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/object_detection/quick_start/quick_start_coco.ipynb)\n", "\n", "\n", "\n", "In this section, our goal is to fast finetune a pretrained model on a small dataset in COCO format, \n", "and evaluate on its test set. Both training and test sets are in COCO format.\n", "See [Convert Data to COCO Format](../data_preparation/convert_data_to_coco_format.ipynb) for how to convert other datasets to COCO format.\n", "\n", "## Setting up the imports" ] }, { "cell_type": "markdown", "id": "45f33969", "metadata": {}, "source": [ "Make sure `mmcv` and `mmdet` are installed:" ] }, { "cell_type": "code", "execution_count": null, "id": "aa00faab-252f-44c9-b8f7-57131aa8251c", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "!pip install autogluon.multimodal" ] }, { "cell_type": "code", "execution_count": null, "id": "3d1cf87b", "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "#!pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 # To use object detection, downgrade the torch version if it's >=2.2\n", "!mim install \"mmcv==2.1.0\" # For Google Colab, use the line below instead to install mmcv\n", "#!pip install \"mmcv==2.1.0\" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html\n", "!pip install \"mmdet==3.2.0\"" ] }, { "cell_type": "markdown", "id": "f6b50410", "metadata": {}, "source": [ "To start, let's import MultiModalPredictor:" ] }, { "cell_type": "code", "execution_count": null, "id": "6122a63c", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import MultiModalPredictor" ] }, { "cell_type": "markdown", "id": "8960c4cf", "metadata": {}, "source": [ "And also import some other packages that will be used in this tutorial:" ] }, { "cell_type": "code", "execution_count": null, "id": "82704cb8", "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "\n", "from autogluon.core.utils.loaders import load_zip" ] }, { "cell_type": "markdown", "id": "5481f3b2", "metadata": {}, "source": [ "## Downloading Data\n", "We have the sample dataset ready in the cloud. Let's download it:" ] }, { "cell_type": "code", "execution_count": null, "id": "ff4dd355", "metadata": {}, "outputs": [], "source": [ "zip_file = \"https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip\"\n", "download_dir = \"./tiny_motorbike_coco\"\n", "\n", "load_zip.unzip(zip_file, unzip_dir=download_dir)\n", "data_dir = os.path.join(download_dir, \"tiny_motorbike\")\n", "train_path = os.path.join(data_dir, \"Annotations\", \"trainval_cocoformat.json\")\n", "test_path = os.path.join(data_dir, \"Annotations\", \"test_cocoformat.json\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "dfc32ec3", "metadata": {}, "source": [ "While using COCO format dataset, the input is the json annotation file of the dataset split.\n", "In this example, `trainval_cocoformat.json` is the annotation file of the train-and-validate split,\n", "and `test_cocoformat.json` is the annotation file of the test split.\n", "\n", "## Creating the MultiModalPredictor\n", "We select the `\"medium_quality\"` presets, which uses a YOLOX-large model pretrained on COCO dataset. This preset is fast to finetune or inference,\n", "and easy to deploy. We also provide presets `\"high_quality\"` with a DINO-Resnet50 model and `\"best quality\"` with a DINO-SwinL model, with much higher performance but also slower and with higher GPU memory usage." ] }, { "cell_type": "code", "execution_count": null, "id": "3eae8201", "metadata": {}, "outputs": [], "source": [ "presets = \"medium_quality\"" ] }, { "cell_type": "markdown", "id": "5467f27e", "metadata": {}, "source": [ "We create the MultiModalPredictor with selected presets.\n", "We need to specify the problem_type to `\"object_detection\"`,\n", "and also provide a `sample_data_path` for the predictor to infer the catgories of the dataset.\n", "Here we provide the `train_path`, and it also works using any other split of this dataset.\n", "And we also provide a `path` to save the predictor. \n", "It will be saved to a automatically generated directory with timestamp under `AutogluonModels` if `path` is not specified." ] }, { "cell_type": "code", "execution_count": null, "id": "228bd148", "metadata": {}, "outputs": [], "source": [ "# Init predictor\n", "import uuid\n", "\n", "model_path = f\"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save\"\n", "\n", "predictor = MultiModalPredictor(\n", " problem_type=\"object_detection\",\n", " sample_data_path=train_path,\n", " presets=presets,\n", " path=model_path,\n", ")" ] }, { "cell_type": "markdown", "id": "02a7af8f", "metadata": {}, "source": [ "## Finetuning the Model\n", "\n", "Learning rate, number of epochs, and batch_size are included in the presets, and thus no need to specify.\n", "Note that we use a two-stage learning rate option during finetuning by default,\n", "and the model head will have 100x learning rate.\n", "Using a two-stage learning rate with high learning rate only on head layers makes\n", "the model converge faster during finetuning. It usually gives better performance as well,\n", "especially on small datasets with hundreds or thousands of images.\n", "We also compute the time of the fit process here for better understanding the speed.\n", "We run it on a g4.2xlarge EC2 machine on AWS,\n", "and part of the command outputs are shown below:" ] }, { "cell_type": "code", "execution_count": null, "id": "421776a2", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "predictor.fit(train_path) # Fit\n", "train_end = time.time()" ] }, { "cell_type": "markdown", "id": "21c2f9c3", "metadata": {}, "source": [ "Notice that at the end of each progress bar, if the checkpoint at current stage is saved,\n", "it prints the model's save path.\n", "In this example, it's `./quick_start_tutorial_temp_save`.\n", "\n", "Print out the time and we can see that it's fast!" ] }, { "cell_type": "code", "execution_count": null, "id": "1b4d02d9", "metadata": {}, "outputs": [], "source": [ "print(\"This finetuning takes %.2f seconds.\" % (train_end - start))" ] }, { "cell_type": "markdown", "id": "4dcc501d", "metadata": {}, "source": [ "## Evaluation\n", "\n", "To evaluate the model we just trained, run following code.\n", "\n", "And the evaluation results are shown in command line output. \n", "The first line is mAP in COCO standard, and the second line is mAP in VOC standard (or mAP50). \n", "For more details about these metrics, see [COCO's evaluation guideline](https://cocodataset.org/#detection-eval).\n", "Note that for presenting a fast finetuning we use presets \"medium_quality\", \n", "you could get better result on this dataset by simply using \"high_quality\" or \"best_quality\" presets, \n", "or customize your own model and hyperparameter settings: [Customization](../../advanced_topics/customization.ipynb), and some other examples at [Fast Fine-tune Coco](../finetune/detection_fast_finetune_coco) or [High Performance Fine-tune Coco](../finetune/detection_high_performance_finetune_coco)." ] }, { "cell_type": "code", "execution_count": null, "id": "8face95d", "metadata": {}, "outputs": [], "source": [ "predictor.evaluate(test_path)\n", "eval_end = time.time()" ] }, { "cell_type": "markdown", "id": "9b9793cc", "metadata": {}, "source": [ "Print out the evaluation time:" ] }, { "cell_type": "code", "execution_count": null, "id": "060144e3", "metadata": {}, "outputs": [], "source": [ "print(\"The evaluation takes %.2f seconds.\" % (eval_end - train_end))" ] }, { "cell_type": "markdown", "id": "ae1e5665", "metadata": {}, "source": [ "We can load a new predictor with previous save_path,\n", "and we can also reset the number of GPUs to use if not all the devices are available:" ] }, { "cell_type": "code", "execution_count": null, "id": "c8bf5a84", "metadata": {}, "outputs": [], "source": [ "# Load and reset num_gpus\n", "new_predictor = MultiModalPredictor.load(model_path)\n", "new_predictor.set_num_gpus(1)" ] }, { "cell_type": "markdown", "id": "e6c906c0", "metadata": {}, "source": [ "Evaluating the new predictor gives us exactly the same result:" ] }, { "cell_type": "code", "execution_count": null, "id": "bc23738a", "metadata": {}, "outputs": [], "source": [ "# Evaluate new predictor\n", "new_predictor.evaluate(test_path)" ] }, { "cell_type": "markdown", "id": "1de5ce66", "metadata": {}, "source": [ "For how to set the hyperparameters and finetune the model with higher performance, \n", "see [AutoMM Detection - High Performance Finetune on COCO Format Dataset](../finetune/detection_high_performance_finetune_coco.ipynb).\n", "\n", "## Inference\n", "Now that we have gone through the model setup, finetuning, and evaluation, this section details the inference. \n", "Specifically, we layout the steps for using the model to make predictions and visualize the results.\n", "\n", "To run inference on the entire test set, perform:" ] }, { "cell_type": "code", "execution_count": null, "id": "8ce6e910", "metadata": {}, "outputs": [], "source": [ "pred = predictor.predict(test_path)\n", "print(pred)" ] }, { "cell_type": "markdown", "id": "db52a92f", "metadata": {}, "source": [ "The output `pred` is a `pandas` `DataFrame` that has two columns, `image` and `bboxes`.\n", "\n", "In `image`, each row contains the image path\n", "\n", "In `bboxes`, each row is a list of dictionaries, each one representing a bounding box: `{\"class\": , \"bbox\": [x1, y1, x2, y2], \"score\": }`\n", "\n", "Note that, by default, the `predictor.predict` does not save the detection results into a file.\n", "\n", "To run inference and save results, run the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "19bd5152", "metadata": {}, "outputs": [], "source": [ "pred = predictor.predict(test_path, save_results=True)" ] }, { "cell_type": "markdown", "id": "a987bc2d", "metadata": {}, "source": [ "Here, we save `pred` into a `.txt` file, which exactly follows the same layout as in `pred`.\n", "You can use a predictor initialized in any way (i.e. finetuned predictor, predictor with pretrained model, etc.).\n", "\n", "## Visualizing Results\n", "To run visualizations, ensure that you have `opencv` installed. If you haven't already, install `opencv` by running" ] }, { "cell_type": "code", "execution_count": null, "id": "d3c26f03", "metadata": {}, "outputs": [], "source": [ "!pip install opencv-python" ] }, { "cell_type": "markdown", "id": "8221cc2b", "metadata": {}, "source": [ "To visualize the detection bounding boxes, run the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "3c2c5c04", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal.utils import ObjectDetectionVisualizer\n", "\n", "conf_threshold = 0.4 # Specify a confidence threshold to filter out unwanted boxes\n", "image_result = pred.iloc[30]\n", "\n", "img_path = image_result.image # Select an image to visualize\n", "\n", "visualizer = ObjectDetectionVisualizer(img_path) # Initialize the Visualizer\n", "out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold) # Draw detections\n", "visualized = out.get_image() # Get the visualized image\n", "\n", "from PIL import Image\n", "from IPython.display import display\n", "img = Image.fromarray(visualized, 'RGB')\n", "display(img)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "df6f645e", "metadata": {}, "source": [ "## Testing on Your Own Data\n", "You can also predict on your own images with various input format. The follow is an example:\n", "\n", "Download the example image:" ] }, { "cell_type": "code", "execution_count": null, "id": "104ab1d4", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import download\n", "image_url = \"https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg\"\n", "test_image = download(image_url)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c516aabd", "metadata": {}, "source": [ "Run inference on data in a json file of COCO format (See [Convert Data to COCO Format](../data_preparation/convert_data_to_coco_format.ipynb) for more details about COCO format). Note that since the root is by default the parent folder of the annotation file, here we put the annotation file in a folder:" ] }, { "cell_type": "code", "execution_count": null, "id": "16100ac3", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "# create a input file for demo\n", "data = {\"images\": [{\"id\": 0, \"width\": -1, \"height\": -1, \"file_name\": test_image}], \"categories\": []}\n", "os.mkdir(\"input_data_for_demo\")\n", "input_file = \"input_data_for_demo/demo_annotation.json\"\n", "with open(input_file, \"w+\") as f:\n", " json.dump(data, f)\n", "\n", "pred_test_image = predictor.predict(input_file)\n", "print(pred_test_image)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "af490ddc", "metadata": {}, "source": [ "Run inference on data in a list of image file names:" ] }, { "cell_type": "code", "execution_count": null, "id": "b2466021", "metadata": {}, "outputs": [], "source": [ "pred_test_image = predictor.predict([test_image])\n", "print(pred_test_image)" ] }, { "cell_type": "markdown", "id": "aa58b518", "metadata": {}, "source": [ "## Other Examples\n", "\n", "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n", "\n", "## Customization\n", "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb).\n", "\n", "## Citation" ] }, { "cell_type": "markdown", "id": "0c8dfcc8", "metadata": {}, "source": [ "```\n", "@article{DBLP:journals/corr/abs-2107-08430,\n", " author = {Zheng Ge and\n", " Songtao Liu and\n", " Feng Wang and\n", " Zeming Li and\n", " Jian Sun},\n", " title = {{YOLOX:} Exceeding {YOLO} Series in 2021},\n", " journal = {CoRR},\n", " volume = {abs/2107.08430},\n", " year = {2021},\n", " url = {https://arxiv.org/abs/2107.08430},\n", " eprinttype = {arXiv},\n", " eprint = {2107.08430},\n", " timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},\n", " biburl = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},\n", " bibsource = {dblp computer science bibliography, https://dblp.org},\n", "}\n", "```\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 }