{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# AutoGluon Tabular - Quick Start\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/tabular/tabular-quick-start.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/tabular/tabular-quick-start.ipynb)\n", "\n", "In this tutorial, we will see how to use AutoGluon's `TabularPredictor` to predict the values of a target column based on the other columns in a tabular dataset.\n", "\n", "Begin by making sure AutoGluon is installed, and then import AutoGluon's `TabularDataset` and `TabularPredictor`. We will use the former to load data and the latter to train models and make predictions. " ], "id": "998885f294556807" }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "!python -m pip install --upgrade pip\n", "!python -m pip install autogluon" ], "id": "f4d1edc3d2f610f6" }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from autogluon.tabular import TabularDataset, TabularPredictor" ], "id": "ff904c9d1af0ac39" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example Data" ], "id": "e42b07bc64929c80" }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this tutorial we will use a dataset from the cover story of [Nature issue 7887](https://www.nature.com/nature/volumes/600/issues/7887): [AI-guided intuition for math theorems](https://www.nature.com/articles/s41586-021-04086-x.pdf). The goal is to predict a knot's signature based on its properties. We sampled 10K training and 5K test examples from the [original data](https://github.com/deepmind/mathematics_conjectures/blob/main/knot_theory.ipynb). The sampled dataset make this tutorial run quickly, but AutoGluon can handle the full dataset if desired.\n", "\n", "We load this dataset directly from a URL. AutoGluon's `TabularDataset` is a subclass of pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), so any `DataFrame` methods can be used on `TabularDataset` as well." ], "id": "f247a5c20c9be613" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'\n", "train_data = TabularDataset(f'{data_url}train.csv')\n", "train_data.head()" ], "id": "bfda6620a2f2637" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our targets are stored in the \"signature\" column, which has 18 unique integers. Even though pandas didn't correctly recognize this data type as categorical, AutoGluon will fix this issue.\n" ], "id": "c810125a2b8aa286" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "label = 'signature'\n", "train_data[label].describe()" ], "id": "735d0a050b701f31" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "\n", "We now construct a `TabularPredictor` by specifying the label column name and then train on the dataset with `TabularPredictor.fit()`. We don't need to specify any other parameters. AutoGluon will recognize this is a multi-class classification task, perform automatic feature engineering, train multiple models, and then ensemble the models to create the final predictor. " ], "id": "ec8a61ef4291bc39" }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "predictor = TabularPredictor(label=label).fit(train_data)" ], "id": "362ff589bb29d77d" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model fitting should take a few minutes or less depending on your CPU. You can make training faster by specifying the `time_limit` argument. For example, `fit(..., time_limit=60)` will stop training after 60 seconds. Higher time limits will generally result in better prediction performance, and excessively low time limits will prevent AutoGluon from training and ensembling a reasonable set of models.\n", "\n" ], "id": "1a0c76c9931ef02a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prediction\n", "\n", "Once we have a predictor that is fit on the training dataset, we can load a separate set of data to use for prediction and evaulation." ], "id": "a14b3b77951c8885" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_data = TabularDataset(f'{data_url}test.csv')\n", "\n", "y_pred = predictor.predict(test_data.drop(columns=[label]))\n", "y_pred.head()" ], "id": "71c5ca4d79e46793" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation\n", "\n", "We can evaluate the predictor on the test dataset using the `evaluate()` function, which measures how well our predictor performs on data that was not used for fitting the models." ], "id": "a07dc2dd8e3225a" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.evaluate(test_data, silent=True)" ], "id": "95d51e36939dcc95" }, { "cell_type": "markdown", "metadata": {}, "source": [ "AutoGluon's `TabularPredictor` also provides the `leaderboard()` function, which allows us to evaluate the performance of each individual trained model on the test data." ], "id": "23ad005fd976a13e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.leaderboard(test_data)" ], "id": "43a52983e6d38da1" }, { "cell_type": "markdown", "metadata": { "id": "I-da0PXvpD96" }, "source": [ "## Conclusion\n", "\n", "In this quickstart tutorial we saw AutoGluon's basic fit and predict functionality using `TabularDataset` and `TabularPredictor`. AutoGluon simplifies the model training process by not requiring feature engineering or model hyperparameter tuning. Check out the in-depth tutorials to learn more about AutoGluon's other features like customizing the training and prediction steps or extending AutoGluon with custom feature generators, models, or metrics." ], "id": "79eb2f75ce0e5eed" } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython" } }, "nbformat": 4, "nbformat_minor": 5 }