Multimodal Prediction

For problems on multimodal data tables that contain image, text, and tabular data, AutoGluon provides MultiModalPredictor (abbreviated as AutoMM) that automatically selects and fuses deep learning backbones from popular packages like timm, huggingface/transformers, CLIP, etc. You can use it to build models for multimodal problems that involve image, text, and tabular features, e.g., predicting the product price based on the items’ description, photo, and other metadata, or matching images with text descriptions.

In addition, being good at multimodal problems implies that the predictor will be good for each specific modality. Thus, you can also use AutoMM to solve standard NLP/Vision tasks like sentiment classification, intent detection, paraphrase detection, image classification. Moreover, AutoMM can be used as a basic model in the multi-layer stack-ensemble of TabularPredictor.

In the following, we prepared a few tutorials to help you learn how to use AutoMM to solve problems that involve image, text, and tabular data.

AutoMM for Text - Quick Startbeginner_text.html

How to train high-quality text prediction models with MultiModalPredictor in under 5 minutes.

AutoMM for Image Classification - Quick Startbeginner_image_cls.html

How to train image classification models with MultiModalPredictor.

AutoMM for Text - Multilingual Problemsmultilingual_text.html

How to use MultiModalPredictor to build models on datasets with languages other than English.

AutoMM for Text + Tabular - Quick Startmultimodal_text_tabular.html

How MultiModalPredictor can be applied to multimodal data tables with a mix of text, numerical, and categorical columns. Here, we train a model to predict the price of books.

AutoMM for Multimodal - Quick Startbeginner_multimodal.html

How to use MultiModalPredictor to train a model that predicts the adoption speed of pets.

CLIP in AutoMM - Zero-Shot Image Classificationclip_zeroshot.html

How to use CLIP for zero-shot image classification.

CLIP in AutoMM - Extract Embeddingsclip_embedding.html

How to use CLIP to extract embeddings for retrieval problem.

Customize AutoMMcustomization.html

How to customize AutoMM configurations.