{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "84a147df", "metadata": {}, "source": [ "# AutoMM Detection - Quick Start on a Tiny COCO Format Dataset\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/quick_start/quick_start_coco.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/object_detection/quick_start/quick_start_coco.ipynb)\n", "\n", "\n", "\n", "In this section, our goal is to fast finetune a pretrained model on a small dataset in COCO format, \n", "and evaluate on its test set. Both training and test sets are in COCO format.\n", "See [Convert Data to COCO Format](../data_preparation/convert_data_to_coco_format.ipynb) for how to convert other datasets to COCO format.\n", "\n", "## Setting up the imports" ] }, { "cell_type": "markdown", "id": "45f33969", "metadata": {}, "source": [ "To start, make sure `mmcv` and `mmdet` are installed.\n", "**Note:** MMDet is no longer actively maintained and is only compatible with MMCV version 2.1.0. Installation can be problematic due to CUDA version compatibility issues. For best results:\n", "1. Use CUDA 12.4 with PyTorch 2.5\n", "2. Before installation, run:\n", " ```bash\n", " pip install -U pip setuptools wheel\n", " sudo apt-get install -y ninja-build gcc g++\n", " ```\n", " This will help prevent MMCV installation from hanging during wheel building.\n", "3. After installation in Jupyter notebook, restart the kernel for changes to take effect.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "aa00faab-252f-44c9-b8f7-57131aa8251c", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "!pip install autogluon.multimodal" ] }, { "cell_type": "code", "execution_count": null, "id": "3d1cf87b", "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "# Update package tools and install build dependencies\n", "!pip install -U pip setuptools wheel\n", "!sudo apt-get install -y ninja-build gcc g++\n", "\n", "# Install MMCV\n", "!python3 -m mim install \"mmcv==2.1.0\"\n", "\n", "# For Google Colab users: If the above fails, use this alternative MMCV installation\n", "# pip install \"mmcv==2.1.0\" -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html\n", "\n", "# Install MMDet\n", "!python3 -m pip install \"mmdet==3.2.0\"\n", "\n", "# Install MMEngine (version >=0.10.6 for PyTorch 2.5 compatibility)\n", "!python3 -m pip install \"mmengine>=0.10.6\"" ] }, { "cell_type": "markdown", "id": "f6b50410", "metadata": {}, "source": [ "To start, let's import MultiModalPredictor:" ] }, { "cell_type": "code", "execution_count": null, "id": "6122a63c", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import MultiModalPredictor" ] }, { "cell_type": "markdown", "id": "8960c4cf", "metadata": {}, "source": [ "And also import some other packages that will be used in this tutorial:" ] }, { "cell_type": "code", "execution_count": null, "id": "82704cb8", "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "\n", "from autogluon.core.utils.loaders import load_zip" ] }, { "cell_type": "markdown", "id": "5481f3b2", "metadata": {}, "source": [ "## Downloading Data\n", "We have the sample dataset ready in the cloud. Let's download it:" ] }, { "cell_type": "code", "execution_count": null, "id": "ff4dd355", "metadata": {}, "outputs": [], "source": [ "zip_file = \"https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip\"\n", "download_dir = \"./tiny_motorbike_coco\"\n", "\n", "load_zip.unzip(zip_file, unzip_dir=download_dir)\n", "data_dir = os.path.join(download_dir, \"tiny_motorbike\")\n", "train_path = os.path.join(data_dir, \"Annotations\", \"trainval_cocoformat.json\")\n", "test_path = os.path.join(data_dir, \"Annotations\", \"test_cocoformat.json\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "dfc32ec3", "metadata": {}, "source": [ "### Dataset Format\n", "\n", "For COCO format datasets, provide JSON annotation files for each split:\n", "\n", "- `trainval_cocoformat.json`: train and validation data\n", "- `test_cocoformat.json`: test data\n", "\n", "### Model Selection\n", "\n", "We use the `medium_quality` preset which features:\n", "\n", "- Base model: YOLOX-large (pretrained on COCO)\n", "- Benefits: Fast finetuning, quick inference, easy deployment\n", "\n", "Alternative presets available:\n", "\n", "- `high_quality`: DINO-Resnet50 model\n", "- `best_quality`: DINO-SwinL model\n", "\n", "Both alternatives offer improved performance at the cost of slower processing and higher GPU memory requirements." ] }, { "cell_type": "code", "execution_count": null, "id": "3eae8201", "metadata": {}, "outputs": [], "source": [ "presets = \"medium_quality\"" ] }, { "cell_type": "markdown", "id": "5467f27e", "metadata": {}, "source": [ "When creating the MultiModalPredictor, specify these essential parameters:\n", "\n", "- `problem_type=\"object_detection\"` to define the task\n", "- `presets=\"medium_quality\"` for presets selection\n", "- `sample_data_path` pointing to any dataset split (typically train_path) to infer object categories\n", "- `path` (optional) to set a custom save location\n", "\n", "If no path is specified, the model will be automatically saved to a timestamped directory under AutogluonModels/." ] }, { "cell_type": "code", "execution_count": null, "id": "228bd148", "metadata": {}, "outputs": [], "source": [ "# Init predictor\n", "import uuid\n", "\n", "model_path = f\"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save\"\n", "\n", "predictor = MultiModalPredictor(\n", " problem_type=\"object_detection\",\n", " sample_data_path=train_path,\n", " presets=presets,\n", " path=model_path,\n", ")" ] }, { "cell_type": "markdown", "id": "02a7af8f", "metadata": {}, "source": [ "## Finetuning the Model\n", "The model uses optimized preset configurations for learning rate, epochs, and batch size. By default, it employs a two-stage learning rate strategy:\n", "\n", "Model head layers use 100x higher learning rate\n", "This approach accelerates convergence and typically improves performance, especially for small datasets (hundreds to thousands of images)\n", "\n", "Timing results below are from a test run on AWS g4.2xlarge EC2 instance:" ] }, { "cell_type": "code", "execution_count": null, "id": "421776a2", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "predictor.fit(train_path) # Fit\n", "train_end = time.time()" ] }, { "cell_type": "markdown", "id": "21c2f9c3", "metadata": {}, "source": [ "Notice that at the end of each progress bar, if the checkpoint at current stage is saved,\n", "it prints the model's save path.\n", "In this example, it's `./quick_start_tutorial_temp_save`.\n", "\n", "Print out the time and we can see that it's fast!" ] }, { "cell_type": "code", "execution_count": null, "id": "1b4d02d9", "metadata": {}, "outputs": [], "source": [ "print(\"This finetuning takes %.2f seconds.\" % (train_end - start))" ] }, { "cell_type": "markdown", "id": "4dcc501d", "metadata": {}, "source": [ "## Evaluation\n", "\n", "To evaluate the model we just trained, run following code.\n", "\n", "And the evaluation results are shown in command line output. \n", "The first line is mAP in COCO standard, and the second line is mAP in VOC standard (or mAP50). \n", "For more details about these metrics, see [COCO's evaluation guideline](https://cocodataset.org/#detection-eval).\n", "Note that for presenting a fast finetuning we use presets \"medium_quality\", \n", "you could get better result on this dataset by simply using \"high_quality\" or \"best_quality\" presets, \n", "or customize your own model and hyperparameter settings: [Customization](../../advanced_topics/customization.ipynb), and some other examples at [Fast Fine-tune Coco](../finetune/detection_fast_finetune_coco) or [High Performance Fine-tune Coco](../finetune/detection_high_performance_finetune_coco)." ] }, { "cell_type": "code", "execution_count": null, "id": "8face95d", "metadata": {}, "outputs": [], "source": [ "predictor.evaluate(test_path)\n", "eval_end = time.time()" ] }, { "cell_type": "markdown", "id": "9b9793cc", "metadata": {}, "source": [ "Print out the evaluation time:" ] }, { "cell_type": "code", "execution_count": null, "id": "060144e3", "metadata": {}, "outputs": [], "source": [ "print(\"The evaluation takes %.2f seconds.\" % (eval_end - train_end))" ] }, { "cell_type": "markdown", "id": "ae1e5665", "metadata": {}, "source": [ "We can load a new predictor with previous save path,\n", "and we can also reset the number of used GPUs if not all the devices are available:" ] }, { "cell_type": "code", "execution_count": null, "id": "c8bf5a84", "metadata": {}, "outputs": [], "source": [ "# Load and reset num_gpus\n", "new_predictor = MultiModalPredictor.load(model_path)\n", "new_predictor.set_num_gpus(1)" ] }, { "cell_type": "markdown", "id": "e6c906c0", "metadata": {}, "source": [ "Evaluating the new predictor gives us exactly the same result:" ] }, { "cell_type": "code", "execution_count": null, "id": "bc23738a", "metadata": {}, "outputs": [], "source": [ "# Evaluate new predictor\n", "new_predictor.evaluate(test_path)" ] }, { "cell_type": "markdown", "id": "1de5ce66", "metadata": {}, "source": [ "For how to set the hyperparameters and finetune the model with higher performance, \n", "see [AutoMM Detection - High Performance Finetune on COCO Format Dataset](../finetune/detection_high_performance_finetune_coco.ipynb).\n", "\n", "## Inference\n", "Let's perform predictions using our finetuned model. The predictor can process the entire test set with a single command:" ] }, { "cell_type": "code", "execution_count": null, "id": "8ce6e910", "metadata": {}, "outputs": [], "source": [ "pred = predictor.predict(test_path)\n", "print(len(pred)) # Number of predictions\n", "print(pred[:3]) # Sample of first 3 predictions" ] }, { "cell_type": "markdown", "id": "db52a92f", "metadata": {}, "source": [ "The predictor returns predictions as a pandas DataFrame with two columns:\n", "- `image`: Contains path to each input image\n", "- `bboxes`: Contains list of detected objects, where each object is a dictionary:\n", " ```python\n", " {\n", " \"class\": \"predicted_class_name\",\n", " \"bbox\": [x1, y1, x2, y2], # Coordinates of Upper Left and Bottom Right corners\n", " \"score\": confidence_score\n", " }\n", " ```\n", "\n", "By default, predictions are returned but not saved. To save detection results, use the save parameter in your predict call." ] }, { "cell_type": "code", "execution_count": null, "id": "19bd5152", "metadata": {}, "outputs": [], "source": [ "pred = predictor.predict(test_path, save_results=True, as_coco=False)" ] }, { "cell_type": "markdown", "id": "a987bc2d", "metadata": {}, "source": [ "The predictions can be saved in two formats:\n", "\n", "- CSV file: Matches the DataFrame structure with image and bboxes columns\n", "- COCO JSON: Standard COCO format annotation file\n", "\n", "This works with any predictor configuration (pretrained or finetuned models).\n", "\n", "## Visualizing Results\n", "To run visualizations, ensure that you have `opencv` installed. If you haven't already, install `opencv` by running" ] }, { "cell_type": "code", "execution_count": null, "id": "d3c26f03", "metadata": {}, "outputs": [], "source": [ "!pip install opencv-python" ] }, { "cell_type": "markdown", "id": "8221cc2b", "metadata": {}, "source": [ "To visualize the detection bounding boxes, run the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "3c2c5c04", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal.utils import ObjectDetectionVisualizer\n", "\n", "conf_threshold = 0.4 # Specify a confidence threshold to filter out unwanted boxes\n", "image_result = pred.iloc[30]\n", "\n", "img_path = image_result.image # Select an image to visualize\n", "\n", "visualizer = ObjectDetectionVisualizer(img_path) # Initialize the Visualizer\n", "out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold) # Draw detections\n", "visualized = out.get_image() # Get the visualized image\n", "\n", "from PIL import Image\n", "from IPython.display import display\n", "img = Image.fromarray(visualized, 'RGB')\n", "display(img)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "df6f645e", "metadata": {}, "source": [ "## Testing on Your Own Data\n", "You can also predict on your own images with various input format. The follow is an example:\n", "\n", "Download the example image:" ] }, { "cell_type": "code", "execution_count": null, "id": "104ab1d4", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import download\n", "image_url = \"https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg\"\n", "test_image = download(image_url)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c516aabd", "metadata": {}, "source": [ "Run inference on data in a json file of COCO format (See [Convert Data to COCO Format](../data_preparation/convert_data_to_coco_format.ipynb) for more details about COCO format). Note that since the root is by default the parent folder of the annotation file, here we put the annotation file in a folder:" ] }, { "cell_type": "code", "execution_count": null, "id": "16100ac3", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "# create a input file for demo\n", "data = {\"images\": [{\"id\": 0, \"width\": -1, \"height\": -1, \"file_name\": test_image}], \"categories\": []}\n", "os.mkdir(\"input_data_for_demo\")\n", "input_file = \"input_data_for_demo/demo_annotation.json\"\n", "with open(input_file, \"w+\") as f:\n", " json.dump(data, f)\n", "\n", "pred_test_image = predictor.predict(input_file)\n", "print(pred_test_image)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "af490ddc", "metadata": {}, "source": [ "Run inference on data in a list of image file names:" ] }, { "cell_type": "code", "execution_count": null, "id": "b2466021", "metadata": {}, "outputs": [], "source": [ "pred_test_image = predictor.predict([test_image])\n", "print(pred_test_image)" ] }, { "cell_type": "markdown", "id": "aa58b518", "metadata": {}, "source": [ "## Other Examples\n", "\n", "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n", "\n", "## Customization\n", "To learn how to customize AutoMM, please refer to [Customize AutoMM](../../advanced_topics/customization.ipynb).\n", "\n", "## Citation" ] }, { "cell_type": "markdown", "id": "0c8dfcc8", "metadata": {}, "source": [ "```\n", "@article{DBLP:journals/corr/abs-2107-08430,\n", " author = {Zheng Ge and\n", " Songtao Liu and\n", " Feng Wang and\n", " Zeming Li and\n", " Jian Sun},\n", " title = {{YOLOX:} Exceeding {YOLO} Series in 2021},\n", " journal = {CoRR},\n", " volume = {abs/2107.08430},\n", " year = {2021},\n", " url = {https://arxiv.org/abs/2107.08430},\n", " eprinttype = {arXiv},\n", " eprint = {2107.08430},\n", " timestamp = {Tue, 05 Apr 2022 14:09:44 +0200},\n", " biburl = {https://dblp.org/rec/journals/corr/abs-2107-08430.bib},\n", " bibsource = {dblp computer science bibliography, https://dblp.org},\n", "}\n", "```\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }