Multimodal Data Tables: Tabular, Text, and Image¶

Tip: Prior to reading this tutorial, it is recommended to have a basic understanding of the TabularPredictor API covered in Predicting Columns in a Table - Quick Start.

In this tutorial, we will train a multi-modal ensemble using data that contains image, text, and tabular features.

Note: A GPU is required for this tutorial in order to train the image and text models. Additionally, GPU installations are required for Torch with appropriate CUDA versions.

The PetFinder Dataset¶

We will be using the PetFinder dataset. The PetFinder dataset provides information about shelter animals that appear on their adoption profile with the goal to predict the adoption rate of the animal. The end goal is for rescue shelters to use the predicted adoption rate to identify animals whose profiles could be improved so that they can find a home.

Each animal’s adoption profile contains a variety of information, such as pictures of the animal, a text description of the animal, and various tabular features such as age, breed, name, color, and more.

To get started, we first need to download the dataset. Datasets that contain images require more than a CSV file, so the dataset is packaged in a zip file in S3. We will first download it and unzip the contents:

download_dir = './ag_petfinder_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip'

from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)

Downloading ./ag_petfinder_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip...

0%|          | 0.00/2.00G [00:00<?, ?iB/s]
0%|          | 8.38M/2.00G [00:00<00:32, 61.4MiB/s]
1%|          | 14.5M/2.00G [00:00<00:47, 41.4MiB/s]
1%|          | 20.3M/2.00G [00:00<00:42, 46.9MiB/s]
1%|▏         | 25.3M/2.00G [00:00<00:46, 42.7MiB/s]
2%|▏         | 33.5M/2.00G [00:00<00:48, 40.2MiB/s]
2%|▏         | 42.5M/2.00G [00:00<00:37, 51.8MiB/s]
2%|▏         | 48.3M/2.00G [00:01<00:42, 46.1MiB/s]
3%|▎         | 53.4M/2.00G [00:01<00:47, 41.0MiB/s]
3%|▎         | 57.8M/2.00G [00:01<00:46, 41.7MiB/s]
3%|▎         | 62.3M/2.00G [00:01<00:45, 42.1MiB/s]
3%|▎         | 69.4M/2.00G [00:01<00:49, 38.9MiB/s]
4%|▍         | 75.5M/2.00G [00:01<00:44, 42.7MiB/s]
4%|▍         | 81.9M/2.00G [00:01<00:39, 47.9MiB/s]
4%|▍         | 87.0M/2.00G [00:02<00:57, 33.0MiB/s]
5%|▍         | 94.5M/2.00G [00:02<00:51, 36.8MiB/s]
5%|▌         | 101M/2.00G [00:02<00:48, 38.8MiB/s]
5%|▌         | 105M/2.00G [00:02<00:54, 34.6MiB/s]
5%|▌         | 109M/2.00G [00:02<01:16, 24.8MiB/s]
6%|▌         | 112M/2.00G [00:03<01:19, 23.8MiB/s]
6%|▌         | 119M/2.00G [00:03<00:56, 33.2MiB/s]
6%|▋         | 126M/2.00G [00:03<00:54, 34.3MiB/s]
7%|▋         | 133M/2.00G [00:03<00:50, 37.2MiB/s]
7%|▋         | 137M/2.00G [00:03<00:49, 37.2MiB/s]
7%|▋         | 146M/2.00G [00:03<00:37, 48.7MiB/s]
8%|▊         | 151M/2.00G [00:03<00:41, 44.2MiB/s]
8%|▊         | 161M/2.00G [00:03<00:32, 56.8MiB/s]
8%|▊         | 167M/2.00G [00:04<00:39, 45.9MiB/s]
9%|▊         | 173M/2.00G [00:04<00:49, 36.5MiB/s]
9%|▉         | 178M/2.00G [00:04<00:48, 37.2MiB/s]
9%|▉         | 185M/2.00G [00:04<00:49, 36.4MiB/s]
10%|▉         | 193M/2.00G [00:04<00:42, 42.6MiB/s]
10%|█         | 201M/2.00G [00:05<00:38, 46.2MiB/s]
10%|█         | 208M/2.00G [00:05<00:34, 51.1MiB/s]
11%|█         | 214M/2.00G [00:05<00:36, 48.6MiB/s]
11%|█         | 219M/2.00G [00:05<00:39, 44.5MiB/s]
11%|█         | 224M/2.00G [00:05<00:41, 42.3MiB/s]
12%|█▏        | 231M/2.00G [00:05<00:36, 48.9MiB/s]
12%|█▏        | 242M/2.00G [00:05<00:27, 64.9MiB/s]
12%|█▏        | 249M/2.00G [00:05<00:35, 49.6MiB/s]
13%|█▎        | 258M/2.00G [00:06<00:29, 58.8MiB/s]
13%|█▎        | 265M/2.00G [00:06<00:32, 52.6MiB/s]
14%|█▎        | 271M/2.00G [00:06<00:47, 36.7MiB/s]
14%|█▍        | 277M/2.00G [00:06<00:43, 39.8MiB/s]
14%|█▍        | 283M/2.00G [00:06<00:39, 43.0MiB/s]
14%|█▍        | 288M/2.00G [00:06<00:45, 37.9MiB/s]
15%|█▍        | 299M/2.00G [00:07<00:33, 51.1MiB/s]
15%|█▌        | 305M/2.00G [00:07<00:32, 52.4MiB/s]
16%|█▌        | 312M/2.00G [00:07<00:29, 57.9MiB/s]
16%|█▌        | 319M/2.00G [00:07<00:32, 52.1MiB/s]
16%|█▋        | 327M/2.00G [00:07<00:33, 49.4MiB/s]
17%|█▋        | 336M/2.00G [00:07<00:32, 50.7MiB/s]
17%|█▋        | 344M/2.00G [00:07<00:30, 55.0MiB/s]
18%|█▊        | 353M/2.00G [00:07<00:26, 62.8MiB/s]
18%|█▊        | 361M/2.00G [00:08<00:26, 62.5MiB/s]
18%|█▊        | 369M/2.00G [00:08<00:26, 60.7MiB/s]
19%|█▉        | 377M/2.00G [00:08<00:26, 60.8MiB/s]
19%|█▉        | 385M/2.00G [00:08<00:25, 62.1MiB/s]
20%|█▉        | 392M/2.00G [00:08<00:30, 52.5MiB/s]
20%|██        | 400M/2.00G [00:08<00:27, 58.8MiB/s]
20%|██        | 406M/2.00G [00:08<00:33, 47.9MiB/s]
21%|██        | 411M/2.00G [00:09<00:33, 47.3MiB/s]
21%|██        | 422M/2.00G [00:09<00:25, 61.1MiB/s]
21%|██▏       | 429M/2.00G [00:09<00:24, 63.1MiB/s]
22%|██▏       | 436M/2.00G [00:09<00:25, 61.3MiB/s]
22%|██▏       | 442M/2.00G [00:09<00:26, 58.0MiB/s]
22%|██▏       | 448M/2.00G [00:09<00:32, 46.9MiB/s]
23%|██▎       | 454M/2.00G [00:09<00:37, 41.6MiB/s]
23%|██▎       | 458M/2.00G [00:10<00:44, 34.7MiB/s]
23%|██▎       | 462M/2.00G [00:10<00:44, 34.8MiB/s]
23%|██▎       | 466M/2.00G [00:10<00:53, 28.3MiB/s]
23%|██▎       | 469M/2.00G [00:10<01:01, 24.7MiB/s]
24%|██▎       | 473M/2.00G [00:10<00:53, 28.4MiB/s]
24%|██▍       | 481M/2.00G [00:10<00:38, 39.0MiB/s]
24%|██▍       | 487M/2.00G [00:10<00:36, 41.9MiB/s]
25%|██▍       | 495M/2.00G [00:11<00:33, 45.1MiB/s]
25%|██▌       | 502M/2.00G [00:11<00:29, 50.5MiB/s]
25%|██▌       | 507M/2.00G [00:11<00:30, 48.5MiB/s]
26%|██▌       | 512M/2.00G [00:11<00:31, 46.6MiB/s]
26%|██▌       | 520M/2.00G [00:11<00:31, 47.5MiB/s]
26%|██▋       | 527M/2.00G [00:11<00:33, 44.1MiB/s]
27%|██▋       | 532M/2.00G [00:11<00:34, 42.3MiB/s]
27%|██▋       | 537M/2.00G [00:12<00:36, 39.9MiB/s]
27%|██▋       | 546M/2.00G [00:12<00:27, 52.1MiB/s]
28%|██▊       | 554M/2.00G [00:12<00:27, 52.1MiB/s]
28%|██▊       | 562M/2.00G [00:12<00:24, 57.6MiB/s]
29%|██▊       | 570M/2.00G [00:12<00:25, 56.3MiB/s]
29%|██▉       | 579M/2.00G [00:12<00:23, 61.1MiB/s]
29%|██▉       | 585M/2.00G [00:12<00:24, 57.6MiB/s]
30%|██▉       | 593M/2.00G [00:12<00:21, 63.9MiB/s]
30%|███       | 600M/2.00G [00:13<00:22, 62.1MiB/s]
30%|███       | 606M/2.00G [00:13<00:24, 57.8MiB/s]
31%|███       | 614M/2.00G [00:13<00:21, 63.5MiB/s]
31%|███       | 621M/2.00G [00:13<00:24, 56.3MiB/s]
32%|███▏      | 629M/2.00G [00:13<00:21, 62.8MiB/s]
32%|███▏      | 640M/2.00G [00:13<00:18, 74.6MiB/s]
32%|███▏      | 648M/2.00G [00:13<00:18, 72.2MiB/s]
33%|███▎      | 655M/2.00G [00:13<00:22, 58.7MiB/s]
33%|███▎      | 666M/2.00G [00:14<00:18, 71.0MiB/s]
34%|███▍      | 674M/2.00G [00:14<00:18, 72.9MiB/s]
34%|███▍      | 682M/2.00G [00:14<00:18, 72.1MiB/s]
35%|███▍      | 689M/2.00G [00:14<00:18, 69.2MiB/s]
35%|███▍      | 697M/2.00G [00:14<00:20, 62.0MiB/s]
35%|███▌      | 703M/2.00G [00:14<00:21, 61.3MiB/s]
36%|███▌      | 709M/2.00G [00:14<00:23, 54.7MiB/s]
36%|███▌      | 718M/2.00G [00:14<00:20, 61.4MiB/s]
36%|███▋      | 724M/2.00G [00:14<00:21, 59.5MiB/s]
37%|███▋      | 732M/2.00G [00:15<00:19, 63.8MiB/s]
37%|███▋      | 738M/2.00G [00:15<00:24, 51.1MiB/s]
37%|███▋      | 744M/2.00G [00:15<00:28, 43.4MiB/s]
38%|███▊      | 749M/2.00G [00:15<00:38, 32.1MiB/s]
38%|███▊      | 753M/2.00G [00:15<00:44, 28.1MiB/s]
38%|███▊      | 760M/2.00G [00:16<00:34, 35.7MiB/s]
38%|███▊      | 764M/2.00G [00:16<00:39, 31.1MiB/s]
39%|███▉      | 774M/2.00G [00:16<00:26, 45.4MiB/s]
39%|███▉      | 780M/2.00G [00:16<00:24, 48.8MiB/s]
40%|███▉      | 789M/2.00G [00:16<00:23, 50.7MiB/s]
40%|███▉      | 796M/2.00G [00:16<00:22, 52.9MiB/s]
40%|████      | 801M/2.00G [00:16<00:22, 52.7MiB/s]
40%|████      | 807M/2.00G [00:16<00:21, 54.2MiB/s]
41%|████      | 816M/2.00G [00:17<00:20, 58.8MiB/s]
41%|████      | 823M/2.00G [00:17<00:19, 60.8MiB/s]
42%|████▏     | 829M/2.00G [00:17<00:20, 56.0MiB/s]
42%|████▏     | 835M/2.00G [00:17<00:27, 41.7MiB/s]
42%|████▏     | 839M/2.00G [00:17<00:30, 37.6MiB/s]
42%|████▏     | 847M/2.00G [00:17<00:25, 45.1MiB/s]
43%|████▎     | 858M/2.00G [00:17<00:18, 60.1MiB/s]
43%|████▎     | 865M/2.00G [00:18<00:22, 49.2MiB/s]
44%|████▎     | 871M/2.00G [00:18<00:26, 42.5MiB/s]
44%|████▍     | 876M/2.00G [00:18<00:26, 41.9MiB/s]
44%|████▍     | 881M/2.00G [00:18<00:35, 31.2MiB/s]
44%|████▍     | 884M/2.00G [00:18<00:36, 30.4MiB/s]
45%|████▍     | 889M/2.00G [00:18<00:34, 32.4MiB/s]
45%|████▍     | 896M/2.00G [00:19<00:27, 39.4MiB/s]
45%|████▌     | 900M/2.00G [00:19<00:30, 36.0MiB/s]
45%|████▌     | 906M/2.00G [00:19<00:28, 38.6MiB/s]
46%|████▌     | 914M/2.00G [00:19<00:26, 41.3MiB/s]
46%|████▋     | 923M/2.00G [00:19<00:20, 52.1MiB/s]
47%|████▋     | 929M/2.00G [00:19<00:23, 44.4MiB/s]
47%|████▋     | 934M/2.00G [00:19<00:23, 45.2MiB/s]
47%|████▋     | 940M/2.00G [00:20<00:22, 46.0MiB/s]
47%|████▋     | 944M/2.00G [00:20<00:31, 33.5MiB/s]
48%|████▊     | 948M/2.00G [00:20<00:32, 31.8MiB/s]
48%|████▊     | 956M/2.00G [00:20<00:28, 36.8MiB/s]
48%|████▊     | 964M/2.00G [00:20<00:23, 44.2MiB/s]
49%|████▊     | 968M/2.00G [00:20<00:23, 43.5MiB/s]
49%|████▉     | 974M/2.00G [00:20<00:21, 47.5MiB/s]
49%|████▉     | 981M/2.00G [00:21<00:21, 47.8MiB/s]
50%|████▉     | 990M/2.00G [00:21<00:20, 50.1MiB/s]
50%|████▉     | 995M/2.00G [00:21<00:22, 45.0MiB/s]
50%|█████     | 1.00G/2.00G [00:21<00:17, 57.1MiB/s]
51%|█████     | 1.01G/2.00G [00:21<00:20, 47.6MiB/s]
51%|█████     | 1.02G/2.00G [00:21<00:23, 41.8MiB/s]
51%|█████▏    | 1.02G/2.00G [00:21<00:20, 47.9MiB/s]
52%|█████▏    | 1.03G/2.00G [00:22<00:20, 46.3MiB/s]
52%|█████▏    | 1.03G/2.00G [00:22<00:21, 45.7MiB/s]
52%|█████▏    | 1.04G/2.00G [00:22<00:20, 47.1MiB/s]
53%|█████▎    | 1.05G/2.00G [00:22<00:17, 55.6MiB/s]
53%|█████▎    | 1.06G/2.00G [00:22<00:17, 54.8MiB/s]
53%|█████▎    | 1.07G/2.00G [00:22<00:16, 56.7MiB/s]
54%|█████▎    | 1.07G/2.00G [00:22<00:15, 58.5MiB/s]
54%|█████▍    | 1.08G/2.00G [00:22<00:16, 55.8MiB/s]
54%|█████▍    | 1.09G/2.00G [00:23<00:14, 61.8MiB/s]
55%|█████▍    | 1.09G/2.00G [00:23<00:15, 57.2MiB/s]
55%|█████▌    | 1.10G/2.00G [00:23<00:17, 52.4MiB/s]
55%|█████▌    | 1.11G/2.00G [00:23<00:16, 55.6MiB/s]
56%|█████▌    | 1.11G/2.00G [00:23<00:19, 44.7MiB/s]
56%|█████▌    | 1.12G/2.00G [00:23<00:25, 34.4MiB/s]
56%|█████▌    | 1.12G/2.00G [00:24<00:32, 27.0MiB/s]
56%|█████▋    | 1.13G/2.00G [00:24<00:28, 30.3MiB/s]
57%|█████▋    | 1.13G/2.00G [00:24<00:29, 29.1MiB/s]
57%|█████▋    | 1.13G/2.00G [00:24<00:30, 28.3MiB/s]
57%|█████▋    | 1.14G/2.00G [00:24<00:23, 36.6MiB/s]
57%|█████▋    | 1.15G/2.00G [00:24<00:21, 39.0MiB/s]
58%|█████▊    | 1.15G/2.00G [00:24<00:22, 37.4MiB/s]
58%|█████▊    | 1.16G/2.00G [00:24<00:19, 43.4MiB/s]
58%|█████▊    | 1.16G/2.00G [00:25<00:21, 39.6MiB/s]
58%|█████▊    | 1.17G/2.00G [00:25<00:20, 41.1MiB/s]
59%|█████▉    | 1.17G/2.00G [00:25<00:18, 45.4MiB/s]
59%|█████▉    | 1.18G/2.00G [00:25<00:15, 50.8MiB/s]
60%|█████▉    | 1.19G/2.00G [00:25<00:14, 55.1MiB/s]
60%|██████    | 1.20G/2.00G [00:25<00:12, 62.6MiB/s]
61%|██████    | 1.21G/2.00G [00:25<00:12, 63.3MiB/s]
61%|██████    | 1.21G/2.00G [00:25<00:13, 59.8MiB/s]
61%|██████    | 1.22G/2.00G [00:26<00:12, 62.0MiB/s]
62%|██████▏   | 1.23G/2.00G [00:26<00:13, 56.8MiB/s]
62%|██████▏   | 1.23G/2.00G [00:26<00:13, 56.0MiB/s]
62%|██████▏   | 1.24G/2.00G [00:26<00:18, 41.5MiB/s]
62%|██████▏   | 1.24G/2.00G [00:26<00:19, 38.1MiB/s]
63%|██████▎   | 1.25G/2.00G [00:26<00:17, 42.1MiB/s]
63%|██████▎   | 1.26G/2.00G [00:26<00:15, 48.0MiB/s]
63%|██████▎   | 1.26G/2.00G [00:27<00:14, 50.0MiB/s]
64%|██████▎   | 1.27G/2.00G [00:27<00:21, 34.2MiB/s]
64%|██████▍   | 1.27G/2.00G [00:27<00:27, 26.5MiB/s]
64%|██████▍   | 1.28G/2.00G [00:27<00:27, 26.2MiB/s]
64%|██████▍   | 1.29G/2.00G [00:27<00:20, 34.7MiB/s]
65%|██████▍   | 1.29G/2.00G [00:28<00:19, 36.8MiB/s]
65%|██████▍   | 1.29G/2.00G [00:28<00:24, 29.2MiB/s]
65%|██████▌   | 1.30G/2.00G [00:28<00:16, 41.8MiB/s]
66%|██████▌   | 1.31G/2.00G [00:28<00:16, 42.5MiB/s]
66%|██████▌   | 1.32G/2.00G [00:28<00:13, 51.0MiB/s]
66%|██████▋   | 1.32G/2.00G [00:28<00:13, 51.4MiB/s]
67%|██████▋   | 1.33G/2.00G [00:28<00:13, 49.4MiB/s]
67%|██████▋   | 1.34G/2.00G [00:28<00:11, 56.6MiB/s]
67%|██████▋   | 1.34G/2.00G [00:29<00:11, 54.5MiB/s]
68%|██████▊   | 1.35G/2.00G [00:29<00:11, 54.6MiB/s]
68%|██████▊   | 1.36G/2.00G [00:29<00:11, 54.4MiB/s]
68%|██████▊   | 1.36G/2.00G [00:29<00:10, 63.0MiB/s]
69%|██████▊   | 1.37G/2.00G [00:29<00:10, 59.8MiB/s]
69%|██████▉   | 1.38G/2.00G [00:29<00:09, 65.5MiB/s]
70%|██████▉   | 1.39G/2.00G [00:29<00:08, 70.4MiB/s]
70%|███████   | 1.40G/2.00G [00:29<00:07, 81.0MiB/s]
71%|███████   | 1.41G/2.00G [00:29<00:07, 73.7MiB/s]
71%|███████   | 1.41G/2.00G [00:30<00:08, 71.1MiB/s]
71%|███████▏  | 1.42G/2.00G [00:30<00:07, 77.2MiB/s]
72%|███████▏  | 1.43G/2.00G [00:30<00:08, 67.2MiB/s]
72%|███████▏  | 1.44G/2.00G [00:30<00:07, 73.2MiB/s]
73%|███████▎  | 1.45G/2.00G [00:30<00:07, 69.1MiB/s]
73%|███████▎  | 1.46G/2.00G [00:30<00:11, 47.7MiB/s]
73%|███████▎  | 1.46G/2.00G [00:30<00:10, 49.7MiB/s]
74%|███████▎  | 1.47G/2.00G [00:31<00:10, 49.4MiB/s]
74%|███████▍  | 1.48G/2.00G [00:31<00:08, 61.0MiB/s]
74%|███████▍  | 1.48G/2.00G [00:31<00:08, 59.9MiB/s]
75%|███████▍  | 1.49G/2.00G [00:31<00:08, 58.2MiB/s]
75%|███████▌  | 1.50G/2.00G [00:31<00:09, 51.6MiB/s]
75%|███████▌  | 1.50G/2.00G [00:31<00:11, 41.9MiB/s]
76%|███████▌  | 1.51G/2.00G [00:31<00:10, 44.9MiB/s]
76%|███████▌  | 1.52G/2.00G [00:32<00:09, 49.5MiB/s]
77%|███████▋  | 1.53G/2.00G [00:32<00:08, 56.3MiB/s]
77%|███████▋  | 1.54G/2.00G [00:32<00:07, 59.1MiB/s]
77%|███████▋  | 1.54G/2.00G [00:32<00:09, 49.4MiB/s]
78%|███████▊  | 1.55G/2.00G [00:32<00:09, 46.7MiB/s]
78%|███████▊  | 1.55G/2.00G [00:32<00:08, 53.4MiB/s]
78%|███████▊  | 1.56G/2.00G [00:32<00:07, 55.0MiB/s]
79%|███████▊  | 1.57G/2.00G [00:32<00:07, 59.8MiB/s]
79%|███████▉  | 1.58G/2.00G [00:33<00:05, 70.4MiB/s]
79%|███████▉  | 1.59G/2.00G [00:33<00:06, 60.3MiB/s]
80%|███████▉  | 1.59G/2.00G [00:33<00:07, 55.3MiB/s]
80%|████████  | 1.60G/2.00G [00:33<00:07, 51.4MiB/s]
80%|████████  | 1.60G/2.00G [00:33<00:08, 48.9MiB/s]
81%|████████  | 1.61G/2.00G [00:33<00:07, 50.0MiB/s]
81%|████████  | 1.62G/2.00G [00:33<00:08, 46.6MiB/s]
81%|████████  | 1.62G/2.00G [00:33<00:08, 45.5MiB/s]
82%|████████▏ | 1.63G/2.00G [00:34<00:07, 50.1MiB/s]
82%|████████▏ | 1.64G/2.00G [00:34<00:06, 58.9MiB/s]
82%|████████▏ | 1.64G/2.00G [00:34<00:06, 56.8MiB/s]
83%|████████▎ | 1.65G/2.00G [00:34<00:06, 53.7MiB/s]
83%|████████▎ | 1.65G/2.00G [00:34<00:08, 39.9MiB/s]
83%|████████▎ | 1.66G/2.00G [00:34<00:08, 41.5MiB/s]
83%|████████▎ | 1.66G/2.00G [00:35<00:10, 30.9MiB/s]
84%|████████▎ | 1.67G/2.00G [00:35<00:09, 32.9MiB/s]
84%|████████▍ | 1.68G/2.00G [00:35<00:07, 41.6MiB/s]
84%|████████▍ | 1.68G/2.00G [00:35<00:07, 40.1MiB/s]
85%|████████▍ | 1.69G/2.00G [00:35<00:08, 34.6MiB/s]
85%|████████▍ | 1.69G/2.00G [00:35<00:07, 43.0MiB/s]
85%|████████▌ | 1.70G/2.00G [00:35<00:06, 48.7MiB/s]
86%|████████▌ | 1.71G/2.00G [00:35<00:05, 54.6MiB/s]
86%|████████▌ | 1.72G/2.00G [00:36<00:05, 51.2MiB/s]
86%|████████▋ | 1.72G/2.00G [00:36<00:06, 45.2MiB/s]
87%|████████▋ | 1.73G/2.00G [00:36<00:06, 38.7MiB/s]
87%|████████▋ | 1.74G/2.00G [00:36<00:05, 50.2MiB/s]
87%|████████▋ | 1.74G/2.00G [00:36<00:05, 48.2MiB/s]
88%|████████▊ | 1.75G/2.00G [00:36<00:04, 56.0MiB/s]
88%|████████▊ | 1.76G/2.00G [00:36<00:03, 66.7MiB/s]
89%|████████▊ | 1.77G/2.00G [00:36<00:03, 64.1MiB/s]
89%|████████▉ | 1.77G/2.00G [00:37<00:03, 62.6MiB/s]
89%|████████▉ | 1.78G/2.00G [00:37<00:03, 62.5MiB/s]
90%|████████▉ | 1.79G/2.00G [00:37<00:03, 64.9MiB/s]
90%|████████▉ | 1.79G/2.00G [00:37<00:03, 56.8MiB/s]
90%|█████████ | 1.80G/2.00G [00:37<00:04, 41.3MiB/s]
90%|█████████ | 1.80G/2.00G [00:37<00:04, 39.5MiB/s]
91%|█████████ | 1.81G/2.00G [00:37<00:04, 39.8MiB/s]
91%|█████████ | 1.81G/2.00G [00:38<00:06, 28.6MiB/s]
91%|█████████ | 1.82G/2.00G [00:38<00:04, 36.7MiB/s]
92%|█████████▏| 1.83G/2.00G [00:38<00:03, 42.5MiB/s]
92%|█████████▏| 1.83G/2.00G [00:38<00:04, 39.2MiB/s]
92%|█████████▏| 1.84G/2.00G [00:38<00:04, 36.9MiB/s]
92%|█████████▏| 1.84G/2.00G [00:38<00:04, 38.5MiB/s]
92%|█████████▏| 1.85G/2.00G [00:38<00:03, 41.1MiB/s]
93%|█████████▎| 1.85G/2.00G [00:39<00:03, 39.6MiB/s]
93%|█████████▎| 1.86G/2.00G [00:39<00:03, 42.6MiB/s]
94%|█████████▎| 1.87G/2.00G [00:39<00:02, 44.8MiB/s]
94%|█████████▍| 1.87G/2.00G [00:39<00:02, 46.6MiB/s]
94%|█████████▍| 1.88G/2.00G [00:39<00:02, 40.9MiB/s]
94%|█████████▍| 1.88G/2.00G [00:39<00:02, 38.9MiB/s]
95%|█████████▍| 1.89G/2.00G [00:39<00:02, 38.0MiB/s]
95%|█████████▍| 1.89G/2.00G [00:40<00:02, 35.1MiB/s]
95%|█████████▌| 1.90G/2.00G [00:40<00:02, 38.7MiB/s]
96%|█████████▌| 1.91G/2.00G [00:40<00:01, 50.4MiB/s]
96%|█████████▌| 1.92G/2.00G [00:40<00:01, 58.8MiB/s]
96%|█████████▋| 1.92G/2.00G [00:40<00:01, 39.2MiB/s]
97%|█████████▋| 1.93G/2.00G [00:40<00:01, 43.6MiB/s]
97%|█████████▋| 1.94G/2.00G [00:41<00:01, 51.0MiB/s]
98%|█████████▊| 1.95G/2.00G [00:41<00:00, 59.1MiB/s]
98%|█████████▊| 1.95G/2.00G [00:41<00:00, 63.3MiB/s]
98%|█████████▊| 1.96G/2.00G [00:41<00:00, 53.5MiB/s]
99%|█████████▊| 1.97G/2.00G [00:41<00:00, 54.3MiB/s]
99%|█████████▉| 1.97G/2.00G [00:41<00:00, 54.1MiB/s]
99%|█████████▉| 1.98G/2.00G [00:41<00:00, 52.2MiB/s]
100%|█████████▉| 1.99G/2.00G [00:41<00:00, 47.2MiB/s]
100%|█████████▉| 1.99G/2.00G [00:42<00:00, 44.0MiB/s]
100%|██████████| 2.00G/2.00G [00:42<00:00, 47.4MiB/s]

Now that the data is download and unzipped, let’s take a look at the contents:

import os
os.listdir(download_dir)

['file.zip', 'petfinder_processed']

‘file.zip’ is the original zip file we downloaded, and ‘petfinder_processed’ is a directory containing the dataset files.

dataset_path = download_dir + '/petfinder_processed'
os.listdir(dataset_path)

['train.csv', 'train_images', 'test.csv', 'test_images', 'dev.csv']

Here we can see the train, test, and dev CSV files, as well as two directories: ‘test_images’ and ‘train_images’ which contain the image JPG files.

Note: We will be using the dev data as testing data as dev contains the ground truth labels for showing scores via predictor.leaderboard.

Let’s take a peek at the first 10 files inside of the ‘train_images’ directory:

os.listdir(dataset_path + '/train_images')[:10]

['d765ae877-1.jpg',
 '756025f7c-2.jpg',
 'e1a2d9477-4.jpg',
 '6d18707ee-2.jpg',
 '96607bca0-5.jpg',
 'fde58f7fa-10.jpg',
 'be7b65c23-3.jpg',
 'dd36ab692-3.jpg',
 '2d8db1c19-2.jpg',
 '53037f091-2.jpg']

As expected, these are the images we will be training with alongside the other features.

Next, we will load the train and dev CSV files:

import pandas as pd

train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/dev.csv', index_col=0)

train_data.head(3)

	Type	Name	Age	Breed1	Breed2	Gender	Color1	Color2	MaturitySize	...	Quantity	State	RescuerID	Description	PetID	PhotoAmt	AdoptionSpeed	Images
10721	1	Elbi	2	307	307	2	5	0	3	...	1	41336	e9a86209c54f589ba72c345364cf01aa	I'm looking for people to adopt my dog	e4b90955c	4.0	4	train_images/e4b90955c-1.jpg;train_images/e4b9...
13114	2	Darling	4	266	0	1	1	0	2	...	1	41401	01f954cdf61526daf3fbeb8a074be742	Darling was born at the back lane of Jalan Alo...	a0c1384d1	5.0	3	train_images/a0c1384d1-1.jpg;train_images/a0c1...
13194	1	Wolf	3	307	0	1	1	2	2	...	1	41332	6e19409f2847326ce3b6d0cec7e42f81	I found Wolf about a month ago stuck in a drai...	cf357f057	7.0	4	train_images/cf357f057-1.jpg;train_images/cf35...

3 rows × 25 columns

Looking at the first 3 examples, we can tell that there is a variety of tabular features, a text description (‘Description’), and an image path (‘Images’).

For the PetFinder dataset, we will try to predict the speed of adoption for the animal (‘AdoptionSpeed’), grouped into 5 categories. This means that we are dealing with a multi-class classification problem.

label = 'AdoptionSpeed'
image_col = 'Images'

Preparing the image column¶

Let’s take a look at what a value in the image column looks like:

train_data[image_col].iloc[0]

'train_images/e4b90955c-1.jpg;train_images/e4b90955c-2.jpg;train_images/e4b90955c-3.jpg;train_images/e4b90955c-4.jpg'

Currently, AutoGluon only supports one image per row. Since the PetFinder dataset contains one or more images per row, we first need to preprocess the image column to only contain the first image of each row.

train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0])
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])

train_data[image_col].iloc[0]

'train_images/e4b90955c-1.jpg'

AutoGluon loads images based on the file path provided by the image column.

Here we update the path to point to the correct location on disk:

def path_expander(path, base_folder):
    path_l = path.split(';')
    return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])

train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))

train_data[image_col].iloc[0]

'/home/ci/autogluon/docs/tutorials/tabular/ag_petfinder_tutorial/petfinder_processed/train_images/e4b90955c-1.jpg'

train_data.head(3)

	Type	Name	Age	Breed1	Breed2	Gender	Color1	Color2	MaturitySize	...	Quantity	State	RescuerID	Description	PetID	PhotoAmt	AdoptionSpeed	Images
10721	1	Elbi	2	307	307	2	5	0	3	...	1	41336	e9a86209c54f589ba72c345364cf01aa	I'm looking for people to adopt my dog	e4b90955c	4.0	4	/home/ci/autogluon/docs/tutorials/tabular/ag_p...
13114	2	Darling	4	266	0	1	1	0	2	...	1	41401	01f954cdf61526daf3fbeb8a074be742	Darling was born at the back lane of Jalan Alo...	a0c1384d1	5.0	3	/home/ci/autogluon/docs/tutorials/tabular/ag_p...
13194	1	Wolf	3	307	0	1	1	2	2	...	1	41332	6e19409f2847326ce3b6d0cec7e42f81	I found Wolf about a month ago stuck in a drai...	cf357f057	7.0	4	/home/ci/autogluon/docs/tutorials/tabular/ag_p...

3 rows × 25 columns

Analyzing an example row¶

Now that we have preprocessed the image column, let’s take a look at an example row of data and display the text description and the picture.

example_row = train_data.iloc[1]

example_row

Type                                                             2
Name                                                       Darling
Age                                                              4
Breed1                                                         266
Breed2                                                           0
Gender                                                           1
Color1                                                           1
Color2                                                           0
Color3                                                           0
MaturitySize                                                     2
FurLength                                                        1
Vaccinated                                                       2
Dewormed                                                         2
Sterilized                                                       2
Health                                                           1
Quantity                                                         1
Fee                                                              0
State                                                        41401
RescuerID                         01f954cdf61526daf3fbeb8a074be742
VideoAmt                                                         0
Description      Darling was born at the back lane of Jalan Alo...
PetID                                                    a0c1384d1
PhotoAmt                                                       5.0
AdoptionSpeed                                                    3
Images           /home/ci/autogluon/docs/tutorials/tabular/ag_p...
Name: 13114, dtype: object

example_row['Description']

'Darling was born at the back lane of Jalan Alor and was foster by a feeder. All his siblings had died of accident. His mother and grandmother had just been spayed. Darling make a great condo/apartment cat. He love to play a lot. He would make a great companion for someone looking for a cat to love.'

example_image = example_row['Images']

from IPython.display import Image, display
pil_img = Image(filename=example_image)
display(pil_img)

../../_images/0f51cf2080119477e6ffba340375c2d0decdd4f1361f8aafc490d55c0ff27824.jpg

The PetFinder dataset is fairly large. For the purposes of the tutorial, we will sample 500 rows for training.

Training on large multi-modal datasets can be very computationally intensive, especially if using the best_quality preset in AutoGluon. When prototyping, it is recommended to sample your data to get an idea of which models are worth training, then gradually train with larger amounts of data and longer time limits as you would with any other machine learning algorithm.

train_data = train_data.sample(500, random_state=0)

Constructing the FeatureMetadata¶

Next, let’s see what AutoGluon infers the feature types to be by constructing a FeatureMetadata object from the training data:

from autogluon.tabular import FeatureMetadata
feature_metadata = FeatureMetadata.from_df(train_data)

print(feature_metadata)

('float', [])        :  1 | ['PhotoAmt']
('int', [])          : 19 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...]
('object', [])       :  4 | ['Name', 'RescuerID', 'PetID', 'Images']
('object', ['text']) :  1 | ['Description']

Notice that FeatureMetadata automatically identified the column ‘Description’ as text, so we don’t need to manually specify that it is text.

In order to leverage images, we need to tell AutoGluon which column contains the image path. We can do this by specifying a FeatureMetadata object and adding the ‘image_path’ special type to the image column. We later pass this custom FeatureMetadata to TabularPredictor.fit.

feature_metadata = feature_metadata.add_special_types({image_col: ['image_path']})

print(feature_metadata)

('float', [])              :  1 | ['PhotoAmt']
('int', [])                : 19 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...]
('object', [])             :  3 | ['Name', 'RescuerID', 'PetID']
('object', ['image_path']) :  1 | ['Images']
('object', ['text'])       :  1 | ['Description']

Specifying the hyperparameters¶

Next, we need to specify the models we want to train with. This is done via the hyperparameters argument to TabularPredictor.fit.

AutoGluon has a predefined config that works well for multimodal datasets called ‘multimodal’. We can access it via:

from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
hyperparameters = get_hyperparameter_config('multimodal')

hyperparameters

{'NN_TORCH': {},
 'GBM': [{},
  {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
  {'learning_rate': 0.03,
   'num_leaves': 128,
   'feature_fraction': 0.9,
   'min_data_in_leaf': 3,
   'ag_args': {'name_suffix': 'Large',
    'priority': 0,
    'hyperparameter_tune_kwargs': None}}],
 'CAT': {},
 'XGB': {},
 'AG_AUTOMM': {},
 'VW': {}}

This hyperparameter config will train a variety of Tabular models as well as finetune an Electra BERT text model, and a ResNet image model.

Fitting with TabularPredictor¶

Now we will train a TabularPredictor on the dataset, using the feature metadata and hyperparameters we defined prior. This TabularPredictor will leverage tabular, text, and image features all at once.

from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label=label).fit(
    train_data=train_data,
    hyperparameters=hyperparameters,
    feature_metadata=feature_metadata,
    time_limit=900,
)

No path specified. Models will be saved in: "AutogluonModels/ag-20250107_023254"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.2b20250107
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Tue Sep 24 10:00:37 UTC 2024
CPU Count:          8
Memory Avail:       28.67 GB / 30.95 GB (92.6%)
Disk Space Avail:   207.77 GB / 255.99 GB (81.2%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 900s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254"
Train Data Rows:    500
Train Data Columns: 24
Label Column:       AdoptionSpeed
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
5 unique label values:  [2, 3, 4, 0, 1]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 5
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory:                    29352.95 MB
Train Data (Original)  Memory Usage: 0.45 MB (0.0% of available memory)
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting IdentityFeatureGenerator...
Fitting RenameFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Fitting TextSpecialFeatureGenerator...
Fitting BinnedFeatureGenerator...
Fitting DropDuplicatesFeatureGenerator...
Fitting TextNgramFeatureGenerator...
Fitting CountVectorizer for text features: ['Description']
CountVectorizer fit with vocabulary size = 170
Fitting IdentityFeatureGenerator...
Fitting IsNanFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Unused Original Features (Count: 1): ['PetID']
These features were not used to generate any of the output features. Add a feature generator compatible with these features to utilize them.
Features can also be unused if they carry very little information, such as being categorical but having almost entirely unique values or being duplicates of other features.
These features do not need to be present at inference time.
('object', []) : 1 | ['PetID']
Types of features in original data (raw dtype, special dtypes):
('float', [])              :  1 | ['PhotoAmt']
('int', [])                : 18 | ['Type', 'Age', 'Breed1', 'Breed2', 'Gender', ...]
('object', [])             :  2 | ['Name', 'RescuerID']
('object', ['image_path']) :  1 | ['Images']
('object', ['text'])       :  1 | ['Description']
Types of features in processed data (raw dtype, special dtypes):
('category', [])                    :   2 | ['Name', 'RescuerID']
('category', ['text_as_category'])  :   1 | ['Description']
('float', [])                       :   1 | ['PhotoAmt']
('int', [])                         :  17 | ['Age', 'Breed1', 'Breed2', 'Gender', 'Color1', ...]
('int', ['binned', 'text_special']) :  24 | ['Description.char_count', 'Description.word_count', 'Description.capital_ratio', 'Description.lower_ratio', 'Description.digit_ratio', ...]
('int', ['bool'])                   :   1 | ['Type']
('int', ['text_ngram'])             : 171 | ['__nlp__.about', '__nlp__.active', '__nlp__.active and', '__nlp__.adopt', '__nlp__.adopted', ...]
('object', ['image_path'])          :   1 | ['Images']
('object', ['text'])                :   1 | ['Description_raw_text']
1.7s = Fit runtime
23 features in original data used to generate 219 features in processed data.
Train Data (Processed) Memory Usage: 0.52 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 1.73s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{}, {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'AG_AUTOMM': [{}],
	'VW': [{}],
}
Fitting 8 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ... Training model for up to 898.27s of the 898.27s of remaining time.
0.34	 = Validation score   (accuracy)
0.71s	 = Training   runtime
0.0s	 = Validation runtime
Fitting model: LightGBMXT ... Training model for up to 897.55s of the 897.55s of remaining time.
0.34	 = Validation score   (accuracy)
0.69s	 = Training   runtime
0.0s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 896.84s of the 896.84s of remaining time.
0.31	 = Validation score   (accuracy)
2.66s	 = Training   runtime
0.01s	 = Validation runtime
Fitting model: XGBoost ... Training model for up to 894.16s of the 894.16s of remaining time.
0.35	 = Validation score   (accuracy)
0.95s	 = Training   runtime
0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 893.19s of the 893.19s of remaining time.
0.34	 = Validation score   (accuracy)
4.06s	 = Training   runtime
0.02s	 = Validation runtime
Fitting model: VowpalWabbit ... Training model for up to 889.10s of the 889.10s of remaining time.
Warning: Exception caused VowpalWabbit to fail during training (ImportError)... Skipping this model.
`import vowpalwabbit` failed.
A quick tip is to install via `pip install vowpalwabbit>=9,<9.10
Fitting model: LightGBMLarge ... Training model for up to 889.09s of the 889.09s of remaining time.
0.37	 = Validation score   (accuracy)
2.27s	 = Training   runtime
0.0s	 = Validation runtime
Fitting model: MultiModalPredictor ... Training model for up to 886.81s of the 886.81s of remaining time.
/home/ci/opt/venv/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
  from torch.distributed.optim import \
INFO: Seed set to 0
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionMLP | 207 M  | train
1 | validation_metric | MulticlassAccuracy  | 0      | train
2 | loss_func         | CrossEntropyLoss    | 0      | train
------------------------------------------------------------------
207 M     Trainable params
0         Non-trainable params
207 M     Total params
828.189   Total estimated model params size (MB)
943       Modules in train mode
225       Modules in eval mode
INFO: Epoch 0, global step 1: 'val_accuracy' reached 0.26000 (best 0.26000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=0-step=1.ckpt' as top 3
INFO: Epoch 0, global step 4: 'val_accuracy' reached 0.26000 (best 0.26000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=0-step=4.ckpt' as top 3
INFO: Epoch 1, global step 5: 'val_accuracy' reached 0.26000 (best 0.26000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=1-step=5.ckpt' as top 3
INFO: Epoch 1, global step 8: 'val_accuracy' reached 0.31000 (best 0.31000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=1-step=8.ckpt' as top 3
INFO: Epoch 2, global step 9: 'val_accuracy' reached 0.32000 (best 0.32000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=2-step=9.ckpt' as top 3
INFO: Epoch 2, global step 12: 'val_accuracy' was not in top 3
INFO: Epoch 3, global step 13: 'val_accuracy' reached 0.27000 (best 0.32000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=3-step=13.ckpt' as top 3
INFO: Epoch 3, global step 16: 'val_accuracy' reached 0.30000 (best 0.32000), saving model to '/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/epoch=3-step=16.ckpt' as top 3
INFO: Epoch 4, global step 17: 'val_accuracy' was not in top 3
INFO: Epoch 4, global step 20: 'val_accuracy' was not in top 3
INFO: Epoch 5, global step 21: 'val_accuracy' was not in top 3
INFO: Epoch 5, global step 24: 'val_accuracy' was not in top 3
INFO: Epoch 6, global step 25: 'val_accuracy' was not in top 3
INFO: Epoch 6, global step 28: 'val_accuracy' was not in top 3
INFO: Epoch 7, global step 29: 'val_accuracy' was not in top 3
/home/ci/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:2117: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(path, map_location=torch.device("cpu"))["state_dict"]  # nosec B614
/home/ci/autogluon/multimodal/src/autogluon/multimodal/utils/checkpoint.py:45: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(per_path, map_location=torch.device("cpu"))["state_dict"]  # nosec B614
/home/ci/autogluon/multimodal/src/autogluon/multimodal/utils/checkpoint.py:45: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(per_path, map_location=torch.device("cpu"))["state_dict"]  # nosec B614
/home/ci/autogluon/multimodal/src/autogluon/multimodal/utils/checkpoint.py:45: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(per_path, map_location=torch.device("cpu"))["state_dict"]  # nosec B614
0.34	 = Validation score   (accuracy)
321.02s	 = Training   runtime
3.04s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.00s of the 559.79s of remaining time.
Ensemble Weights: {'LightGBMLarge': 1.0}
0.37	 = Validation score   (accuracy)
0.06s	 = Training   runtime
0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 340.3s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 21567.9 rows/s (100 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254")

After the predictor is fit, we can take a look at the leaderboard and see the performance of the various models:

leaderboard = predictor.leaderboard(test_data)

Load pretrained checkpoint: /home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250107_023254/models/MultiModalPredictor/automm_model/model.ckpt
/home/ci/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:2117: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(path, map_location=torch.device("cpu"))["state_dict"]  # nosec B614

That’s all it takes to train with image, text, and tabular data (at the same time) using AutoGluon!

For more tutorials, refer to Predicting Columns in a Table - Quick Start and Predicting Columns in a Table - In Depth.