.. _sec_tabularprediction_multimodal: Multimodal Data Tables: Tabular, Text, and Image ================================================ **Tip**: Prior to reading this tutorial, it is recommended to have a basic understanding of the TabularPredictor API covered in :ref:`sec_tabularquick`. In this tutorial, we will train a multi-modal ensemble using data that contains image, text, and tabular features. Note: A GPU is required for this tutorial in order to train the image and text models. Additionally, GPU installations are required for MXNet and Torch with appropriate CUDA versions. The PetFinder Dataset --------------------- We will be using the `PetFinder dataset `__. The PetFinder dataset provides information about shelter animals that appear on their adoption profile with the goal to predict the adoption rate of the animal. The end goal is for rescue shelters to use the predicted adoption rate to identify animals whose profiles could be improved so that they can find a home. Each animal’s adoption profile contains a variety of information, such as pictures of the animal, a text description of the animal, and various tabular features such as age, breed, name, color, and more. To get started, we first need to download the dataset. Datasets that contain images require more than a CSV file, so the dataset is packaged in a zip file in S3. We will first download it and unzip the contents: .. code:: python download_dir = './ag_petfinder_tutorial' zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip' .. code:: python from autogluon.core.utils.loaders import load_zip load_zip.unzip(zip_file, unzip_dir=download_dir) .. parsed-literal:: :class: output Downloading ./ag_petfinder_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip... .. parsed-literal:: :class: output 100%|██████████| 2.00G/2.00G [00:49<00:00, 40.5MiB/s] Now that the data is download and unzipped, let’s take a look at the contents: .. code:: python import os os.listdir(download_dir) .. parsed-literal:: :class: output ['file.zip', 'petfinder_processed'] ‘file.zip’ is the original zip file we downloaded, and ‘petfinder_processed’ is a directory containing the dataset files. .. code:: python dataset_path = download_dir + '/petfinder_processed' os.listdir(dataset_path) .. parsed-literal:: :class: output ['train.csv', 'train_images', 'test.csv', 'test_images', 'dev.csv'] Here we can see the train, test, and dev CSV files, as well as two directories: ‘test_images’ and ‘train_images’ which contain the image JPG files. Note: We will be using the dev data as testing data as dev contains the ground truth labels for showing scores via ``predictor.leaderboard``. Let’s take a peek at the first 10 files inside of the ‘train_images’ directory: .. code:: python os.listdir(dataset_path + '/train_images')[:10] .. parsed-literal:: :class: output ['d765ae877-1.jpg', '756025f7c-2.jpg', 'e1a2d9477-4.jpg', '6d18707ee-2.jpg', '96607bca0-5.jpg', 'fde58f7fa-10.jpg', 'be7b65c23-3.jpg', 'dd36ab692-3.jpg', '2d8db1c19-2.jpg', '53037f091-2.jpg'] As expected, these are the images we will be training with alongside the other features. Next, we will load the train and dev CSV files: .. code:: python import pandas as pd train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0) test_data = pd.read_csv(f'{dataset_path}/dev.csv', index_col=0) .. code:: python train_data.head(3) .. raw:: html
Type Name Age Breed1 Breed2 Gender Color1 Color2 Color3 MaturitySize ... Quantity Fee State RescuerID VideoAmt Description PetID PhotoAmt AdoptionSpeed Images
10721 1 Elbi 2 307 307 2 5 0 0 3 ... 1 0 41336 e9a86209c54f589ba72c345364cf01aa 0 I'm looking for people to adopt my dog e4b90955c 4.0 4 train_images/e4b90955c-1.jpg;train_images/e4b9...
13114 2 Darling 4 266 0 1 1 0 0 2 ... 1 0 41401 01f954cdf61526daf3fbeb8a074be742 0 Darling was born at the back lane of Jalan Alo... a0c1384d1 5.0 3 train_images/a0c1384d1-1.jpg;train_images/a0c1...
13194 1 Wolf 3 307 0 1 1 2 0 2 ... 1 0 41332 6e19409f2847326ce3b6d0cec7e42f81 0 I found Wolf about a month ago stuck in a drai... cf357f057 7.0 4 train_images/cf357f057-1.jpg;train_images/cf35...

3 rows × 25 columns

Looking at the first 3 examples, we can tell that there is a variety of tabular features, a text description (‘Description’), and an image path (‘Images’). For the PetFinder dataset, we will try to predict the speed of adoption for the animal (‘AdoptionSpeed’), grouped into 5 categories. This means that we are dealing with a multi-class classification problem. .. code:: python label = 'AdoptionSpeed' image_col = 'Images' Preparing the image column -------------------------- Let’s take a look at what a value in the image column looks like: .. code:: python train_data[image_col].iloc[0] .. parsed-literal:: :class: output 'train_images/e4b90955c-1.jpg;train_images/e4b90955c-2.jpg;train_images/e4b90955c-3.jpg;train_images/e4b90955c-4.jpg' Currently, AutoGluon only supports one image per row. Since the PetFinder dataset contains one or more images per row, we first need to preprocess the image column to only contain the first image of each row. .. code:: python train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0]) train_data[image_col].iloc[0] .. parsed-literal:: :class: output 'train_images/e4b90955c-1.jpg' AutoGluon loads images based on the file path provided by the image column. Here we update the path to point to the correct location on disk: .. code:: python def path_expander(path, base_folder): path_l = path.split(';') return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l]) train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path)) test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path)) train_data[image_col].iloc[0] .. parsed-literal:: :class: output '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/ag_petfinder_tutorial/petfinder_processed/train_images/e4b90955c-1.jpg' .. code:: python train_data.head(3) .. raw:: html
Type Name Age Breed1 Breed2 Gender Color1 Color2 Color3 MaturitySize ... Quantity Fee State RescuerID VideoAmt Description PetID PhotoAmt AdoptionSpeed Images
10721 1 Elbi 2 307 307 2 5 0 0 3 ... 1 0 41336 e9a86209c54f589ba72c345364cf01aa 0 I'm looking for people to adopt my dog e4b90955c 4.0 4 /home/ci/autogluon/docs/_build/eval/tutorials/...
13114 2 Darling 4 266 0 1 1 0 0 2 ... 1 0 41401 01f954cdf61526daf3fbeb8a074be742 0 Darling was born at the back lane of Jalan Alo... a0c1384d1 5.0 3 /home/ci/autogluon/docs/_build/eval/tutorials/...
13194 1 Wolf 3 307 0 1 1 2 0 2 ... 1 0 41332 6e19409f2847326ce3b6d0cec7e42f81 0 I found Wolf about a month ago stuck in a drai... cf357f057 7.0 4 /home/ci/autogluon/docs/_build/eval/tutorials/...

3 rows × 25 columns

Analyzing an example row ------------------------ Now that we have preprocessed the image column, let’s take a look at an example row of data and display the text description and the picture. .. code:: python example_row = train_data.iloc[1] example_row .. parsed-literal:: :class: output Type 2 Name Darling Age 4 Breed1 266 Breed2 0 Gender 1 Color1 1 Color2 0 Color3 0 MaturitySize 2 FurLength 1 Vaccinated 2 Dewormed 2 Sterilized 2 Health 1 Quantity 1 Fee 0 State 41401 RescuerID 01f954cdf61526daf3fbeb8a074be742 VideoAmt 0 Description Darling was born at the back lane of Jalan Alo... PetID a0c1384d1 PhotoAmt 5.0 AdoptionSpeed 3 Images /home/ci/autogluon/docs/_build/eval/tutorials/... Name: 13114, dtype: object .. code:: python example_row['Description'] .. parsed-literal:: :class: output 'Darling was born at the back lane of Jalan Alor and was foster by a feeder. All his siblings had died of accident. His mother and grandmother had just been spayed. Darling make a great condo/apartment cat. He love to play a lot. He would make a great companion for someone looking for a cat to love.' .. code:: python example_image = example_row['Images'] from IPython.display import Image, display pil_img = Image(filename=example_image) display(pil_img) .. figure:: output_tabular-multimodal_e625cb_24_0.jpg The PetFinder dataset is fairly large. For the purposes of the tutorial, we will sample 500 rows for training. Training on large multi-modal datasets can be very computationally intensive, especially if using the ``best_quality`` preset in AutoGluon. When prototyping, it is recommended to sample your data to get an idea of which models are worth training, then gradually train with larger amounts of data and longer time limits as you would with any other machine learning algorithm. .. code:: python train_data = train_data.sample(500, random_state=0) Constructing the FeatureMetadata -------------------------------- Next, let’s see what AutoGluon infers the feature types to be by constructing a FeatureMetadata object from the training data: .. code:: python from autogluon.tabular import FeatureMetadata feature_metadata = FeatureMetadata.from_df(train_data) print(feature_metadata) .. parsed-literal:: :class: output ('float', []) : 1 | ['PhotoAmt'] ('int', []) : 19 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...] ('object', []) : 4 | ['Name', 'RescuerID', 'PetID', 'Images'] ('object', ['text']) : 1 | ['Description'] Notice that FeatureMetadata automatically identified the column ‘Description’ as text, so we don’t need to manually specify that it is text. In order to leverage images, we need to tell AutoGluon which column contains the image path. We can do this by specifying a FeatureMetadata object and adding the ‘image_path’ special type to the image column. We later pass this custom FeatureMetadata to TabularPredictor.fit. .. code:: python feature_metadata = feature_metadata.add_special_types({image_col: ['image_path']}) print(feature_metadata) .. parsed-literal:: :class: output ('float', []) : 1 | ['PhotoAmt'] ('int', []) : 19 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...] ('object', []) : 3 | ['Name', 'RescuerID', 'PetID'] ('object', ['image_path']) : 1 | ['Images'] ('object', ['text']) : 1 | ['Description'] Specifying the hyperparameters ------------------------------ Next, we need to specify the models we want to train with. This is done via the ``hyperparameters`` argument to TabularPredictor.fit. AutoGluon has a predefined config that works well for multimodal datasets called ‘multimodal’. We can access it via: .. code:: python from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config hyperparameters = get_hyperparameter_config('multimodal') hyperparameters .. parsed-literal:: :class: output {'NN_TORCH': {}, 'GBM': [{}, {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, 'GBMLarge'], 'CAT': {}, 'XGB': {}, 'AG_TEXT_NN': {'presets': 'medium_quality_faster_train'}, 'AG_IMAGE_NN': {}, 'VW': {}} This hyperparameter config will train a variety of Tabular models as well as finetune an Electra BERT text model, and a ResNet image model. Fitting with TabularPredictor ----------------------------- Now we will train a TabularPredictor on the dataset, using the feature metadata and hyperparameters we defined prior. This TabularPredictor will leverage tabular, text, and image features all at once. .. code:: python from autogluon.tabular import TabularPredictor predictor = TabularPredictor(label=label).fit( train_data=train_data, hyperparameters=hyperparameters, feature_metadata=feature_metadata, time_limit=900, ) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20221213_015251/" Beginning AutoGluon training ... Time limit = 900s AutoGluon will save models to "AutogluonModels/ag-20221213_015251/" AutoGluon Version: 0.6.1b20221213 Python Version: 3.8.10 Operating System: Linux Platform Machine: x86_64 Platform Version: #1 SMP Tue Nov 30 00:17:50 UTC 2021 Train Data Rows: 500 Train Data Columns: 24 Label Column: AdoptionSpeed Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed). 5 unique label values: [2, 3, 4, 0, 1] If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Train Data Class Count: 5 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 31478.96 MB Train Data (Original) Memory Usage: 0.49 MB (0.0% of available memory) Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 1 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting IdentityFeatureGenerator... Fitting RenameFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Fitting TextSpecialFeatureGenerator... Fitting BinnedFeatureGenerator... Fitting DropDuplicatesFeatureGenerator... Fitting TextNgramFeatureGenerator... Fitting CountVectorizer for text features: ['Description'] CountVectorizer fit with vocabulary size = 170 Fitting IdentityFeatureGenerator... Fitting IsNanFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Unused Original Features (Count: 1): ['PetID'] These features were not used to generate any of the output features. Add a feature generator compatible with these features to utilize them. Features can also be unused if they carry very little information, such as being categorical but having almost entirely unique values or being duplicates of other features. These features do not need to be present at inference time. ('object', []) : 1 | ['PetID'] Types of features in original data (raw dtype, special dtypes): ('float', []) : 1 | ['PhotoAmt'] ('int', []) : 18 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...] ('object', []) : 2 | ['Name', 'RescuerID'] ('object', ['image_path']) : 1 | ['Images'] ('object', ['text']) : 1 | ['Description'] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 2 | ['Name', 'RescuerID'] ('category', ['text_as_category']) : 1 | ['Description'] ('float', []) : 1 | ['PhotoAmt'] ('int', []) : 17 | ['Age', 'Breed1', 'Breed2', 'Gender', 'Color1', ...] ('int', ['binned', 'text_special']) : 24 | ['Description.char_count', 'Description.word_count', 'Description.capital_ratio', 'Description.lower_ratio', 'Description.digit_ratio', ...] ('int', ['bool']) : 1 | ['Type'] ('int', ['text_ngram']) : 171 | ['__nlp__.about', '__nlp__.active', '__nlp__.active and', '__nlp__.adopt', '__nlp__.adopted', ...] ('object', ['image_path']) : 1 | ['Images'] ('object', ['text']) : 1 | ['Description_raw_text'] 0.5s = Fit runtime 23 features in original data used to generate 219 features in processed data. Train Data (Processed) Memory Usage: 0.56 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.57s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100 Fitting 9 L1 models ... Fitting model: LightGBM ... Training model for up to 899.43s of the 899.43s of remaining time. 0.34 = Validation score (accuracy) 1.84s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMXT ... Training model for up to 897.58s of the 897.58s of remaining time. 0.34 = Validation score (accuracy) 1.47s = Training runtime 0.01s = Validation runtime Fitting model: CatBoost ... Training model for up to 896.09s of the 896.09s of remaining time. 0.3 = Validation score (accuracy) 3.33s = Training runtime 0.01s = Validation runtime Fitting model: XGBoost ... Training model for up to 892.74s of the 892.74s of remaining time. 0.35 = Validation score (accuracy) 2.07s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetTorch ... Training model for up to 890.65s of the 890.64s of remaining time. 0.33 = Validation score (accuracy) 1.52s = Training runtime 0.03s = Validation runtime Fitting model: VowpalWabbit ... Training model for up to 889.09s of the 889.09s of remaining time. 0.24 = Validation score (accuracy) 0.75s = Training runtime 0.03s = Validation runtime Fitting model: LightGBMLarge ... Training model for up to 888.02s of the 888.02s of remaining time. 0.37 = Validation score (accuracy) 2.76s = Training runtime 0.01s = Validation runtime Fitting model: TextPredictor ... Training model for up to 885.25s of the 885.25s of remaining time. The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. .. parsed-literal:: :class: output Moving 0 files to the new cache system .. parsed-literal:: :class: output 0it [00:00, ?it/s] .. parsed-literal:: :class: output INFO:pytorch_lightning.utilities.seed:Global seed set to 0 INFO:pytorch_lightning.trainer.connectors.accelerator_connector:Auto select gpus: [0] INFO:pytorch_lightning.utilities.rank_zero:Using 16bit native Automatic Mixed Precision (AMP) INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] INFO:pytorch_lightning.callbacks.model_summary: | Name | Type | Params ---------------------------------------------------------- 0 | model | MultimodalFusionMLP | 13.7 M 1 | validation_metric | Accuracy | 0 2 | loss_func | CrossEntropyLoss | 0 ---------------------------------------------------------- 13.7 M Trainable params 0 Non-trainable params 13.7 M Total params 27.305 Total estimated model params size (MB) INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 1: 'val_accuracy' reached 0.24000 (best 0.24000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=0-step=1.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 0, global step 4: 'val_accuracy' reached 0.23000 (best 0.24000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=0-step=4.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 5: 'val_accuracy' reached 0.23000 (best 0.24000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=1-step=5.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 1, global step 8: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 2, global step 9: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 2, global step 12: 'val_accuracy' reached 0.30000 (best 0.30000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=2-step=12.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 3, global step 13: 'val_accuracy' reached 0.29000 (best 0.30000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=3-step=13.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 3, global step 16: 'val_accuracy' reached 0.34000 (best 0.34000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=3-step=16.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 4, global step 17: 'val_accuracy' reached 0.34000 (best 0.34000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=4-step=17.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 4, global step 20: 'val_accuracy' reached 0.31000 (best 0.34000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=4-step=20.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 5, global step 21: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 5, global step 24: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 6, global step 25: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 6, global step 28: 'val_accuracy' reached 0.34000 (best 0.34000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=6-step=28.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 7, global step 29: 'val_accuracy' reached 0.35000 (best 0.35000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=7-step=29.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 7, global step 32: 'val_accuracy' reached 0.35000 (best 0.35000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=7-step=32.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 8, global step 33: 'val_accuracy' reached 0.35000 (best 0.35000), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/epoch=8-step=33.ckpt' as top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 8, global step 36: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 9, global step 37: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:Epoch 9, global step 40: 'val_accuracy' was not in top 3 INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=10` reached. Configuration saved in AutogluonModels/ag-20221213_015251/models/TextPredictor/text_nn/hf_text/config.json tokenizer config file saved in AutogluonModels/ag-20221213_015251/models/TextPredictor/text_nn/hf_text/tokenizer_config.json Special tokens file saved in AutogluonModels/ag-20221213_015251/models/TextPredictor/text_nn/hf_text/special_tokens_map.json 0.35 = Validation score (accuracy) 80.54s = Training runtime 1.28s = Validation runtime Fitting model: ImagePredictor ... Training model for up to 803.21s of the 803.21s of remaining time. AutoGluon ImagePredictor will be deprecated in v0.7. Please use AutoGluon MultiModalPredictor instead for more functionalities and better support. Visit https://auto.gluon.ai/stable/tutorials/multimodal/index.html for more details! ImagePredictor sets accuracy as default eval_metric for classification problems. The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1 INFO:TorchImageClassificationEstimator:modified configs( != ): { INFO:TorchImageClassificationEstimator:root.train.early_stop_baseline 0.0 != -inf INFO:TorchImageClassificationEstimator:root.train.early_stop_max_value 1.0 != inf INFO:TorchImageClassificationEstimator:root.train.epochs 200 != 15 INFO:TorchImageClassificationEstimator:root.train.batch_size 32 != 16 INFO:TorchImageClassificationEstimator:root.train.early_stop_patience -1 != 10 INFO:TorchImageClassificationEstimator:root.misc.seed 42 != 542 INFO:TorchImageClassificationEstimator:root.misc.num_workers 4 != 8 INFO:TorchImageClassificationEstimator:root.img_cls.model resnet101 != resnet50 INFO:TorchImageClassificationEstimator:} INFO:TorchImageClassificationEstimator:Saved config to /home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/ImagePredictor/645c8044/.trial_0/config.yaml Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth" to /home/ci/.cache/torch/hub/checkpoints/resnet50_a1_0-14fe96d1.pth INFO:TorchImageClassificationEstimator:Model resnet50 created, param count: 23518277 INFO:TorchImageClassificationEstimator:AMP not enabled. Training in float32. INFO:TorchImageClassificationEstimator:Disable EMA as it is not supported for now. INFO:TorchImageClassificationEstimator:Start training from [Epoch 0] INFO:TorchImageClassificationEstimator:[Epoch 0] training: accuracy=0.212500 INFO:TorchImageClassificationEstimator:[Epoch 0] speed: 82 samples/sec time cost: 4.664190 INFO:TorchImageClassificationEstimator:[Epoch 0] validation: top1=0.220000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 0] Current best top-1: 0.220000 vs previous -inf, saved to /home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/ImagePredictor/645c8044/.trial_0/best_checkpoint.pkl INFO:TorchImageClassificationEstimator:[Epoch 1] training: accuracy=0.280000 INFO:TorchImageClassificationEstimator:[Epoch 1] speed: 90 samples/sec time cost: 4.237421 INFO:TorchImageClassificationEstimator:[Epoch 1] validation: top1=0.260000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 1] Current best top-1: 0.260000 vs previous 0.220000, saved to /home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/ImagePredictor/645c8044/.trial_0/best_checkpoint.pkl INFO:TorchImageClassificationEstimator:[Epoch 2] training: accuracy=0.282500 INFO:TorchImageClassificationEstimator:[Epoch 2] speed: 90 samples/sec time cost: 4.254888 INFO:TorchImageClassificationEstimator:[Epoch 2] validation: top1=0.290000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 2] Current best top-1: 0.290000 vs previous 0.260000, saved to /home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/ImagePredictor/645c8044/.trial_0/best_checkpoint.pkl INFO:TorchImageClassificationEstimator:[Epoch 3] training: accuracy=0.352500 INFO:TorchImageClassificationEstimator:[Epoch 3] speed: 90 samples/sec time cost: 4.264510 INFO:TorchImageClassificationEstimator:[Epoch 3] validation: top1=0.200000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 4] training: accuracy=0.337500 INFO:TorchImageClassificationEstimator:[Epoch 4] speed: 90 samples/sec time cost: 4.265185 INFO:TorchImageClassificationEstimator:[Epoch 4] validation: top1=0.220000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 5] training: accuracy=0.382500 INFO:TorchImageClassificationEstimator:[Epoch 5] speed: 89 samples/sec time cost: 4.287378 INFO:TorchImageClassificationEstimator:[Epoch 5] validation: top1=0.230000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 6] training: accuracy=0.400000 INFO:TorchImageClassificationEstimator:[Epoch 6] speed: 89 samples/sec time cost: 4.282075 INFO:TorchImageClassificationEstimator:[Epoch 6] validation: top1=0.240000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 7] training: accuracy=0.387500 INFO:TorchImageClassificationEstimator:[Epoch 7] speed: 88 samples/sec time cost: 4.318741 INFO:TorchImageClassificationEstimator:[Epoch 7] validation: top1=0.260000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 8] training: accuracy=0.380000 INFO:TorchImageClassificationEstimator:[Epoch 8] speed: 89 samples/sec time cost: 4.291041 INFO:TorchImageClassificationEstimator:[Epoch 8] validation: top1=0.210000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 9] training: accuracy=0.365000 INFO:TorchImageClassificationEstimator:[Epoch 9] speed: 89 samples/sec time cost: 4.313218 INFO:TorchImageClassificationEstimator:[Epoch 9] validation: top1=0.210000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 10] training: accuracy=0.390000 INFO:TorchImageClassificationEstimator:[Epoch 10] speed: 88 samples/sec time cost: 4.328169 INFO:TorchImageClassificationEstimator:[Epoch 10] validation: top1=0.200000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 11] training: accuracy=0.402500 INFO:TorchImageClassificationEstimator:[Epoch 11] speed: 88 samples/sec time cost: 4.330627 INFO:TorchImageClassificationEstimator:[Epoch 11] validation: top1=0.250000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 12] training: accuracy=0.447500 INFO:TorchImageClassificationEstimator:[Epoch 12] speed: 88 samples/sec time cost: 4.334503 INFO:TorchImageClassificationEstimator:[Epoch 12] validation: top1=0.200000 top5=1.000000 INFO:TorchImageClassificationEstimator:[Epoch 13] EarlyStop after 10 epochs: no better than 0.29 INFO:TorchImageClassificationEstimator:Applying the state from the best checkpoint... 0.29 = Validation score (accuracy) 65.91s = Training runtime 0.88s = Validation runtime Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 735.98s of remaining time. 0.37 = Validation score (accuracy) 0.22s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 164.26s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221213_015251/") After the predictor is fit, we can take a look at the leaderboard and see the performance of the various models: .. code:: python leaderboard = predictor.leaderboard(test_data) .. parsed-literal:: :class: output loading file vocab.txt loading file tokenizer.json loading file added_tokens.json loading file special_tokens_map.json loading file tokenizer_config.json loading configuration file /home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/text_nn/hf_text/config.json Model config ElectraConfig { "_name_or_path": "/home/ci/autogluon/docs/_build/eval/tutorials/tabular_prediction/AutogluonModels/ag-20221213_015251/models/TextPredictor/text_nn/hf_text", "architectures": [ "ElectraForPreTraining" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "embedding_size": 128, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 256, "initializer_range": 0.02, "intermediate_size": 1024, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "electra", "num_attention_heads": 4, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "summary_activation": "gelu", "summary_last_dropout": 0.1, "summary_type": "first", "summary_use_proj": true, "transformers_version": "4.23.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } Token indices sequence length is longer than the specified maximum sequence length for this model (587 > 512). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (1394 > 512). Running this sequence through the model will result in indexing errors .. parsed-literal:: :class: output model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 TextPredictor 0.328443 0.35 9.930616 1.280958 80.535329 9.930616 1.280958 80.535329 1 True 8 1 LightGBMLarge 0.323775 0.37 0.397190 0.006300 2.758249 0.397190 0.006300 2.758249 1 True 7 2 WeightedEnsemble_L2 0.323775 0.37 0.400923 0.006765 2.976480 0.003733 0.000465 0.218231 2 True 10 3 CatBoost 0.319106 0.30 0.022767 0.012185 3.331276 0.022767 0.012185 3.331276 1 True 3 4 LightGBMXT 0.315772 0.34 0.037119 0.006674 1.465299 0.037119 0.006674 1.465299 1 True 2 5 ImagePredictor 0.309770 0.29 11.333637 0.882369 65.905820 11.333637 0.882369 65.905820 1 True 9 6 NeuralNetTorch 0.306435 0.33 0.052176 0.027912 1.519180 0.052176 0.027912 1.519180 1 True 5 7 XGBoost 0.292431 0.35 0.054872 0.006682 2.066890 0.054872 0.006682 2.066890 1 True 4 8 LightGBM 0.289763 0.34 0.015358 0.006625 1.837485 0.015358 0.006625 1.837485 1 True 1 9 VowpalWabbit 0.278760 0.24 0.780789 0.034062 0.753597 0.780789 0.034062 0.753597 1 True 6 That’s all it takes to train with image, text, and tabular data (at the same time) using AutoGluon! For an in-depth tutorial on text + tabular multimodal functionality, refer to :ref:`sec_tabularprediction_text_multimodal`. For more tutorials, refer to :ref:`sec_tabularquick` and :ref:`sec_tabularadvanced`.