{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Continuous Training with AutoMM\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/advanced_topics/continuous_training.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/stable/docs/tutorials/multimodal/advanced_topics/continuous_training.ipynb)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Continuous training provides a method for machine learning models to refine their performance over time. It enables models to build upon previously acquired knowledge, thereby enhancing accuracy, facilitating knowledge transfer across tasks, and saving computational resources. In this tutorial, we will demonstrate three use cases of continuous training with AutoMM." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use Case 1: Expanding Training with Additional Data or Training Time\n", "\n", "Sometimes, the model could benefit from more training epochs or additional training time in case of underfitting. With AutoMM, you can easily extend the training time of your model without starting from scratch.\n", "\n", "Additionally, it's also common to need to incorporate more data into your model. AutoMM allows you to continue training with data of the same problem type and same classes if it is a multiclass problem. This flexibility makes it easy to improve and adapt your models as your data grows." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use [Stanford Sentiment Treebank (SST)](https://nlp.stanford.edu/sentiment/) dataset as an example. It consists of movie reviews and their associated sentiment. Given a new movie review, the goal is to predict the sentiment reflected in the text (in this case a binary classification, where reviews are labeled as 1 if they convey a positive opinion and labeled as 0 otherwise). Let’s first load and look at the data, noting the labels are stored in a column called label." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from autogluon.core.utils.loaders import load_pd\n", "\n", "train_data = load_pd.load(\"https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet\")\n", "test_data = load_pd.load(\"https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet\")\n", "subsample_size = 1000 # subsample data for faster demo, try setting this to larger values\n", "train_data_1 = train_data.sample(n=subsample_size, random_state=0)\n", "train_data_1.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's train the model. To ensure this tutorial runs quickly, we simply call fit() with a subset of 1000 training examples and limit its runtime to approximately 1 minute. To achieve reasonable performance in your applications, you are recommended to set much longer time_limit (eg. 1 hour), or do not specify time_limit at all (time_limit=None)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from autogluon.multimodal import MultiModalPredictor\n", "import uuid\n", "\n", "model_path = f\"./tmp/{uuid.uuid4().hex}-automm_sst\"\n", "predictor = MultiModalPredictor(label=\"label\", eval_metric=\"acc\", path=model_path)\n", "predictor.fit(train_data_1, time_limit=60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After training, we can evaluate our predictor on separate test data formatted similarly to our training data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_score = predictor.evaluate(test_data)\n", "print(test_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the training was completed successfully, `model.ckpt` can be found under `model_path`. If you think the model still underfits, you can continue training from this checkpoint by just running another `.fit()` with the same data. If you have some new data to add in and don't want to train from scratch, you can also run `.fit()` with the new combined dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor_2 = MultiModalPredictor.load(model_path) # you can also use the `predictor` we assigned above\n", "train_data_2 = train_data.drop(train_data_1.index).sample(n=subsample_size, random_state=0)\n", "predictor_2.fit(train_data_2, time_limit=60)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_score_2 = predictor_2.evaluate(test_data)\n", "print(test_score_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use Case 2: Resuming Training from the Last Checkpoint\n", "\n", "If your training process collapsed for some reason, AutoMM allows you to resume training right from where you left off. `last.ckpt` will be saved under `model_path` instead of `model.ckpt`. By resuming the training, you just have to call `MultiModalPredictor.load()` with `resume` option:\n", "\n", "\n", "```\n", "predictor_resume = MultiModalPredictor.load(path=model_path, resume=True)\n", "predictor.fit(train_data, time_limit=60)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use Case 3: Applying Pre-Trained Models to New Tasks\n", "\n", "Often, you'll encounter situations where a new task is related but not identical to a task you've previously trained a model for (e.g., training a more fine-grained sentiment analysis model, or adding more classes to your multiclass model). If you wish to leverage the knowledge that the model has already learned from the old data to help it learn the new task more quickly and effectively, AutoMM supports dumping your trained models into model weights and using them as foundation models:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dump_model_path = f\"./tmp/{uuid.uuid4().hex}-automm_sst\"\n", "predictor.dump_model(save_path=dump_model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can then load the weights of the trained model, and continue training / fine-tuning the model on the new data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an example that uses the binary text model we trained previously on a regression task. We use the [Semantic Textual Similarity Benchmark dataset](https://paperswithcode.com/dataset/sts-benchmark?t) for illustration only, so you might want to apply this feature to more relevant datasets. In this data, the column named score contains numerical values (which we would like to predict) that are human-annotated similarity scores for each given pair of sentences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sts_train_data = load_pd.load(\"https://autogluon-text.s3-accelerate.amazonaws.com/glue/sts/train.parquet\")[\n", " [\"sentence1\", \"sentence2\", \"score\"]\n", "]\n", "sts_test_data = load_pd.load(\"https://autogluon-text.s3-accelerate.amazonaws.com/glue/sts/dev.parquet\")[\n", " [\"sentence1\", \"sentence2\", \"score\"]\n", "]\n", "sts_train_data.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To specify a custom model that you created, use `hyperparameters` option in `.fit()`:\n", "\n", "```\n", "hyperparameters={\n", " \"model.hf_text.checkpoint_name\": dump_model_path\n", "}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sts_model_path = f\"./tmp/{uuid.uuid4().hex}-automm_sts\"\n", "predictor_sts = MultiModalPredictor(label=\"score\", path=sts_model_path)\n", "predictor_sts.fit(\n", " sts_train_data, hyperparameters={\"model.hf_text.checkpoint_name\": f\"{dump_model_path}/hf_text\"}, time_limit=30\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_score = predictor_sts.evaluate(sts_test_data, metrics=[\"rmse\", \"pearsonr\", \"spearmanr\"])\n", "print(\"RMSE = {:.2f}\".format(test_score[\"rmse\"]))\n", "print(\"PEARSONR = {:.4f}\".format(test_score[\"pearsonr\"]))\n", "print(\"SPEARMANR = {:.4f}\".format(test_score[\"spearmanr\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We currently support dumping `timm` image models, `MMDetection` image models, `HuggingFace` text models, and any fusion models that comprises the aforementioned models. Similarly, we can also load a custom trained `timm` image model with:\n", "```\n", "{\"model.timm_image.checkpoint_name\": timm_image_model_path}\n", "```\n", "and a custom trained `MMDetection` model with:\n", "```\n", "{\"model.mmdet_image.checkpoint_name\": mmdet_image_model_path}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This feature helps you apply the knowledge of your previously trained task onto a new task, which saves your time and computational power. We will not go into details in this tutorial, but do keep in mind that we have not addressed a big challenge in this use case, i.e. [Catastrophic Forgetting](https://en.wikipedia.org/wiki/Catastrophic_interference#:~:text=Catastrophic%20interference%2C%20also%20known%20as,information%20upon%20learning%20new%20information.)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 2 }