{ "cells": [ { "cell_type": "markdown", "id": "db27159a", "metadata": {}, "source": [ "# AutoMM for Image + Text + Tabular - Quick Start\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/multimodal_prediction/beginner_multimodal.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/multimodal_prediction/beginner_multimodal.ipynb)\n", "\n", "\n", "\n", "AutoMM is a deep learning \"model zoo\" of model zoos. It can automatically build deep learning models that are suitable for multimodal datasets. You will only need to convert the data into the multimodal dataframe format\n", "and AutoMM can predict the values of one column conditioned on the features from the other columns including images, text, and tabular data." ] }, { "cell_type": "code", "execution_count": null, "id": "aa00faab-252f-44c9-b8f7-57131aa8251c", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "!pip install autogluon.multimodal\n" ] }, { "cell_type": "code", "execution_count": null, "id": "48f913fd", "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "np.random.seed(123)" ] }, { "cell_type": "markdown", "id": "7f47025d", "metadata": {}, "source": [ "## Dataset\n", "\n", "For demonstration, we use a simplified and subsampled version of [PetFinder dataset](https://www.kaggle.com/c/petfinder-adoption-prediction). The task is to predict the animals' adoption rates based on their adoption profile information. In this simplified version, the adoption speed is grouped into two categories: 0 (slow) and 1 (fast).\n", "\n", "To get started, let's download and prepare the dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "ab751256", "metadata": {}, "outputs": [], "source": [ "download_dir = './ag_automm_tutorial'\n", "zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'\n", "from autogluon.core.utils.loaders import load_zip\n", "load_zip.unzip(zip_file, unzip_dir=download_dir)" ] }, { "cell_type": "markdown", "id": "5a3b23a2", "metadata": {}, "source": [ "Next, we will load the CSV files." ] }, { "cell_type": "code", "execution_count": null, "id": "d0ec7147", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "dataset_path = download_dir + '/petfinder_for_tutorial'\n", "train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)\n", "test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)\n", "label_col = 'AdoptionSpeed'" ] }, { "cell_type": "markdown", "id": "d20978f6", "metadata": {}, "source": [ "We need to expand the image paths to load them in training." ] }, { "cell_type": "code", "execution_count": null, "id": "c84d6811", "metadata": {}, "outputs": [], "source": [ "image_col = 'Images'\n", "train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial\n", "test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])\n", "\n", "\n", "def path_expander(path, base_folder):\n", " path_l = path.split(';')\n", " return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])\n", "\n", "train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))\n", "test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))\n", "\n", "train_data[image_col].iloc[0]" ] }, { "cell_type": "markdown", "id": "81b87ba9", "metadata": {}, "source": [ "Each animal's adoption profile includes pictures, a text description, and various tabular features such as age, breed, name, color, and more. Let's look at an example row of data and display the text description and a picture." ] }, { "cell_type": "code", "execution_count": null, "id": "9a367dc3", "metadata": {}, "outputs": [], "source": [ "example_row = train_data.iloc[0]\n", "\n", "example_row" ] }, { "cell_type": "code", "execution_count": null, "id": "ab5e5ae7", "metadata": {}, "outputs": [], "source": [ "example_row['Description']" ] }, { "cell_type": "code", "execution_count": null, "id": "4c7ea867", "metadata": {}, "outputs": [], "source": [ "example_image = example_row[image_col]\n", "\n", "from IPython.display import Image, display\n", "pil_img = Image(filename=example_image)\n", "display(pil_img)" ] }, { "cell_type": "markdown", "id": "39566d91", "metadata": {}, "source": [ "## Training\n", "Now let's fit the predictor with the training data. Here we set a tight time budget for a quick demo." ] }, { "cell_type": "code", "execution_count": null, "id": "e4de7118", "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import MultiModalPredictor\n", "predictor = MultiModalPredictor(label=label_col)\n", "predictor.fit(\n", " train_data=train_data,\n", " time_limit=120, # seconds\n", ")" ] }, { "cell_type": "markdown", "id": "0a278b95", "metadata": {}, "source": [ "Under the hood, AutoMM automatically infers the problem type (classification or regression), detects the data modalities, selects the related models from the multimodal model pools, and trains the selected models. If multiple backbones are available, AutoMM appends a late-fusion model (MLP or transformer) on top of them.\n", "\n", "\n", "## Evaluation\n", "Then we can evaluate the predictor on the test data." ] }, { "cell_type": "code", "execution_count": null, "id": "07f88dd6", "metadata": {}, "outputs": [], "source": [ "scores = predictor.evaluate(test_data, metrics=[\"roc_auc\"])\n", "scores" ] }, { "cell_type": "markdown", "id": "e8efcf9d", "metadata": {}, "source": [ "## Prediction\n", "Given a multimodal dataframe without the label column, we can predict the labels." ] }, { "cell_type": "code", "execution_count": null, "id": "ad6f0ab9", "metadata": {}, "outputs": [], "source": [ "predictions = predictor.predict(test_data.drop(columns=label_col))\n", "predictions[:5]" ] }, { "cell_type": "markdown", "id": "e277d050", "metadata": {}, "source": [ "For classification tasks, we can get the probabilities of all classes." ] }, { "cell_type": "code", "execution_count": null, "id": "6ca45069", "metadata": {}, "outputs": [], "source": [ "probas = predictor.predict_proba(test_data.drop(columns=label_col))\n", "probas[:5]" ] }, { "cell_type": "markdown", "id": "2bfe15c2", "metadata": {}, "source": [ "Note that calling `.predict_proba()` on one regression task will throw an exception.\n", "\n", "\n", "## Extract Embeddings\n", "\n", "Extracting embeddings can also be useful in many cases, where we want to convert each sample (per row in the dataframe) into an embedding vector." ] }, { "cell_type": "code", "execution_count": null, "id": "6b364412", "metadata": {}, "outputs": [], "source": [ "embeddings = predictor.extract_embedding(test_data.drop(columns=label_col))\n", "embeddings.shape" ] }, { "cell_type": "markdown", "id": "189fc2aa", "metadata": {}, "source": [ "## Save and Load\n", "It is also convenient to save a predictor and re-load it.\n", "\n", "```{warning}\n", "\n", "`MultiModalPredictor.load()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust.**\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "4c924b34", "metadata": {}, "outputs": [], "source": [ "import uuid\n", "\n", "model_path = f\"./tmp/{uuid.uuid4().hex}-saved_model\"\n", "predictor.save(model_path)\n", "loaded_predictor = MultiModalPredictor.load(model_path)\n", "scores2 = loaded_predictor.evaluate(test_data, metrics=[\"roc_auc\"])\n", "scores2" ] }, { "cell_type": "markdown", "id": "6585bd6a", "metadata": {}, "source": [ "## Other Examples\n", "\n", "You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.\n", "\n", "## Customization\n", "To learn how to customize AutoMM, please refer to [Customize AutoMM](../advanced_topics/customization.ipynb)." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }