Multimodal Prediction ===================== For problems on multimodal data tables that contain image, text, and tabular data, AutoGluon provides `MultiModalPredictor` (abbreviated as `AutoMM`) that automatically selects and fuses deep learning backbones from popular packages like `timm `_, `huggingface/transformers `_, `CLIP `_, etc. You can use it to build models for multimodal problems that involve image, text, and tabular features, e.g., predicting the product price based on the items' description, photo, and other metadata, or matching images with text descriptions. In addition, being good at multimodal problems implies that the predictor will be good for **each specific modality**. Thus, you can also use `AutoMM` to solve standard NLP/Vision tasks like sentiment classification, intent detection, paraphrase detection, image classification. Moreover, `AutoMM` can be used as a basic model in the multi-layer stack-ensemble of `TabularPredictor`. In the following, we prepared a few tutorials to help you learn how to use `AutoMM` to solve problems that involve image, text, and tabular data. .. container:: cards .. card:: :title: AutoMM for Text - Quick Start :link: beginner_text.html How to train high-quality text prediction models with MultiModalPredictor in under 5 minutes. .. card:: :title: AutoMM for Image Classification - Quick Start :link: beginner_image_cls.html How to train image classification models with MultiModalPredictor. .. card:: :title: AutoMM for Text - Multilingual Problems :link: multilingual_text.html How to use MultiModalPredictor to build models on datasets with languages other than English. .. card:: :title: AutoMM for Text + Tabular - Quick Start :link: multimodal_text_tabular.html How MultiModalPredictor can be applied to multimodal data tables with a mix of text, numerical, and categorical columns. Here, we train a model to predict the price of books. .. card:: :title: AutoMM for Multimodal - Quick Start :link: beginner_multimodal.html How to use MultiModalPredictor to train a model that predicts the adoption speed of pets. .. card:: :title: CLIP in AutoMM - Zero-Shot Image Classification :link: clip_zeroshot.html How to use CLIP for zero-shot image classification. .. card:: :title: CLIP in AutoMM - Extract Embeddings :link: clip_embedding.html How to use CLIP to extract embeddings for retrieval problem. .. card:: :title: Customize AutoMM :link: customization.html How to customize AutoMM configurations. .. toctree:: :maxdepth: 1 :hidden: beginner_text beginner_image_cls multilingual_text multimodal_text_tabular beginner_multimodal clip_zeroshot clip_embedding customization