{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AutoGluon Tabular - Quick Start\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/tabular/tabular-quick-start.ipynb)\n",
    "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/tabular/tabular-quick-start.ipynb)\n",
    "\n",
    "In this tutorial, we will see how to use AutoGluon's `TabularPredictor` to predict the values of a target column based on the other columns in a tabular dataset.\n",
    "\n",
    "Begin by making sure AutoGluon is installed, and then import AutoGluon's `TabularDataset` and `TabularPredictor`. We will use the former to load data and the latter to train models and make predictions. "
   ],
   "id": "998885f294556807"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "!python -m pip install --upgrade pip\n",
    "!python -m pip install autogluon"
   ],
   "id": "f4d1edc3d2f610f6"
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogluon.tabular import TabularDataset, TabularPredictor"
   ],
   "id": "ff904c9d1af0ac39"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Data"
   ],
   "id": "e42b07bc64929c80"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this tutorial we will use a dataset from the cover story of [Nature issue 7887](https://www.nature.com/nature/volumes/600/issues/7887): [AI-guided intuition for math theorems](https://www.nature.com/articles/s41586-021-04086-x.pdf). The goal is to predict a knot's signature based on its properties. We sampled 10K training and 5K test examples from the [original data](https://github.com/deepmind/mathematics_conjectures/blob/main/knot_theory.ipynb). The sampled dataset make this tutorial run quickly, but AutoGluon can handle the full dataset if desired.\n",
    "\n",
    "We load this dataset directly from a URL. AutoGluon's `TabularDataset` is a subclass of pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), so any `DataFrame` methods can be used on `TabularDataset` as well."
   ],
   "id": "f247a5c20c9be613"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'\n",
    "train_data = TabularDataset(f'{data_url}train.csv')\n",
    "train_data.head()"
   ],
   "id": "bfda6620a2f2637"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our targets are stored in the \"signature\" column, which has 18 unique integers. Even though pandas didn't correctly recognize this data type as categorical, AutoGluon will fix this issue.\n"
   ],
   "id": "c810125a2b8aa286"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "label = 'signature'\n",
    "train_data[label].describe()"
   ],
   "id": "735d0a050b701f31"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "We now construct a `TabularPredictor` by specifying the label column name and then train on the dataset with `TabularPredictor.fit()`. We don't need to specify any other parameters. AutoGluon will recognize this is a multi-class classification task, perform automatic feature engineering, train multiple models, and then ensemble the models to create the final predictor. "
   ],
   "id": "ec8a61ef4291bc39"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "predictor = TabularPredictor(label=label).fit(train_data)"
   ],
   "id": "362ff589bb29d77d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Model fitting should take a few minutes or less depending on your CPU. You can make training faster by specifying the `time_limit` argument. For example, `fit(..., time_limit=60)` will stop training after 60 seconds. Higher time limits will generally result in better prediction performance, and excessively low time limits will prevent AutoGluon from training and ensembling a reasonable set of models.\n",
    "\n"
   ],
   "id": "1a0c76c9931ef02a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prediction\n",
    "\n",
    "Once we have a predictor that is fit on the training dataset, we can load a separate set of data to use for prediction and evaulation."
   ],
   "id": "a14b3b77951c8885"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data = TabularDataset(f'{data_url}test.csv')\n",
    "\n",
    "y_pred = predictor.predict(test_data.drop(columns=[label]))\n",
    "y_pred.head()"
   ],
   "id": "71c5ca4d79e46793"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "\n",
    "We can evaluate the predictor on the test dataset using the `evaluate()` function, which measures how well our predictor performs on data that was not used for fitting the models."
   ],
   "id": "a07dc2dd8e3225a"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.evaluate(test_data, silent=True)"
   ],
   "id": "95d51e36939dcc95"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "AutoGluon's `TabularPredictor` also provides the `leaderboard()` function, which allows us to evaluate the performance of each individual trained model on the test data."
   ],
   "id": "23ad005fd976a13e"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.leaderboard(test_data)"
   ],
   "id": "43a52983e6d38da1"
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I-da0PXvpD96"
   },
   "source": [
    "## Conclusion\n",
    "\n",
    "In this quickstart tutorial we saw AutoGluon's basic fit and predict functionality using `TabularDataset` and `TabularPredictor`. AutoGluon simplifies the model training process by not requiring feature engineering or model hyperparameter tuning. Check out the in-depth tutorials to learn more about AutoGluon's other features like customizing the training and prediction steps or extending AutoGluon with custom feature generators, models, or metrics."
   ],
   "id": "79eb2f75ce0e5eed"
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}