.. _sec_automm_distillation_multilingual: Knowledge Distillation in AutoMM ================================ Pretrained foundation models are becoming increasingly large. However, these models are difficult to deploy due to limited resources available in deployment scenarios. To benefit from large models under this constraint, you transfer the knowledge from the large-scale teacher models to the student model, with knowledge distillation. In this way, the small student model can be practically deployed under real-world scenarios, while the performance will be better than training the student model from scratch thanks to the teacher. In this tutorial, we introduce how to adopt ``MultiModalPredictor`` for knowledge distillation. For the purpose of demonstration, we use the `Question-answering NLI `__ dataset, which comprises 104,743 question, answer pairs sampled from question answering datasets. We will demonstrate how to use a large model to guide the learning and improve the performance of a small model in AutoGluon. Load Dataset ------------ The `Question-answering NLI `__ dataset contains sentence pairs in English. In the label column, ``0`` means that the sentence is not related to the question and ``1`` means that the sentence is related to the question. .. code:: python import datasets from datasets import load_dataset datasets.logging.disable_progress_bar() dataset = load_dataset("glue", "qnli") .. parsed-literal:: :class: output Downloading and preparing dataset glue/qnli to /home/ci/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad... Dataset glue downloaded and prepared to /home/ci/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data. .. code:: python dataset['train'] .. parsed-literal:: :class: output Dataset({ features: ['question', 'sentence', 'label', 'idx'], num_rows: 104743 }) .. code:: python from sklearn.model_selection import train_test_split train_valid_df = dataset["train"].to_pandas()[["question", "sentence", "label"]].sample(1000, random_state=123) train_df, valid_df = train_test_split(train_valid_df, test_size=0.2, random_state=123) test_df = dataset["validation"].to_pandas()[["question", "sentence", "label"]].sample(1000, random_state=123) Load the Teacher Model ---------------------- In our example, we will directly load a teacher model with the `google/bert_uncased_L-12_H-768_A-12 `__ backbone that has been trained on QNLI and distill it into a student model with the `google/bert_uncased_L-6_H-768_A-12 `__ backbone. .. code:: python !wget --quiet https://automl-mm-bench.s3.amazonaws.com/unit-tests/distillation_sample_teacher.zip -O distillation_sample_teacher.zip !unzip -q -o distillation_sample_teacher.zip -d . .. code:: python from autogluon.multimodal import MultiModalPredictor teacher_predictor = MultiModalPredictor.load("ag_distillation_sample_teacher/") .. parsed-literal:: :class: output Start to upgrade the previous configuration trained by AutoMM version=0.5.3b20221108. /home/ci/opt/venv/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 1.0.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/ci/opt/venv/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator StandardScaler from version 1.0.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( Load pretrained checkpoint: /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/ag_distillation_sample_teacher/model.ckpt Distill to Student ------------------ Training the student model is straight forward. You may just add the ``teacher_predictor`` argument when calling ``.fit()``. Internally, the student will be trained by matching the prediction/feature map from the teacher. It can perform better than directly finetuning the student. .. code:: python student_predictor = MultiModalPredictor(label="label") student_predictor.fit( train_df, tuning_data=valid_df, teacher_predictor=teacher_predictor, hyperparameters={ "model.hf_text.checkpoint_name": "google/bert_uncased_L-6_H-768_A-12", "optimization.max_epochs": 2, } ) .. parsed-literal:: :class: output Global seed set to 123 No path specified. Models will be saved in: "AutogluonModels/ag-20230222_235620/" AutoMM starts to create your model. ✨ - Model will be saved to "/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620". - Validation metric is "roc_auc". - To track the learning progress, you can open a terminal and launch Tensorboard: ```shell # Assume you have installed tensorboard tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620 ``` Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai /home/ci/opt/venv/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `AUROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. warnings.warn(*args, **kwargs) /home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/parsing.py:263: UserWarning: Attribute 'softmax_regression_loss_func' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['softmax_regression_loss_func'])`. rank_zero_warn( Using 16bit None Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ------------------------------------------------------------------------------ 0 | student_model | HFAutoModelForTextPrediction | 67.0 M 1 | teacher_model | HFAutoModelForTextPrediction | 109 M 2 | validation_metric | AUROC | 0 3 | hard_label_loss_func | CrossEntropyLoss | 0 4 | soft_label_loss_func | CrossEntropyLoss | 0 5 | softmax_regression_loss_func | MSELoss | 0 6 | output_feature_loss_func | MSELoss | 0 7 | output_feature_adaptor | Identity | 0 8 | rkd_loss_func | RKDLoss | 0 ------------------------------------------------------------------------------ 176 M Trainable params 0 Non-trainable params 176 M Total params 352.881 Total estimated model params size (MB) Epoch 0, global step 3: 'val_roc_auc' reached 0.63582 (best 0.63582), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620/epoch=0-step=3.ckpt' as top 3 Epoch 0, global step 7: 'val_roc_auc' reached 0.69993 (best 0.69993), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620/epoch=0-step=7.ckpt' as top 3 Epoch 1, global step 10: 'val_roc_auc' reached 0.71410 (best 0.71410), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620/epoch=1-step=10.ckpt' as top 3 Epoch 1, global step 14: 'val_roc_auc' reached 0.71611 (best 0.71611), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620/epoch=1-step=14.ckpt' as top 3 `Trainer.fit` stopped: `max_epochs=2` reached. Start to fuse 3 checkpoints via the greedy soup algorithm. AutoMM has created your model 🎉🎉🎉 - To load the model, use the code below: ```python from autogluon.multimodal import MultiModalPredictor predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620") ``` - You can open a terminal and launch Tensorboard to visualize the training log: ```shell # Assume you have installed tensorboard tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20230222_235620 ``` - If you are not satisfied with the model, try to increase the training time, adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html), or post issues on GitHub: https://github.com/autogluon/autogluon .. parsed-literal:: :class: output .. code:: python print(student_predictor.evaluate(data=test_df)) .. parsed-literal:: :class: output {'roc_auc': 0.7953558897543919} More about Knowledge Distillation --------------------------------- To learn how to customize distillation and how it compares with direct finetuning, see the distillation examples and README in `AutoMM Distillation Examples `__. Especially the `multilingual distillation example `__ with more details and customization. Other Examples -------------- You may go to `AutoMM Examples `__ to explore other examples about AutoMM. Customization ------------- To learn how to customize AutoMM, please refer to :ref:`sec_automm_customization`.