{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "e415414b-5da4-460b-a7e4-617fe22063db", "metadata": {}, "source": [ "# Forecasting Time Series - In Depth\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/timeseries/forecasting-indepth.ipynb)\n", "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/timeseries/forecasting-indepth.ipynb)\n", "\n", "\n", "This tutorial provides an in-depth overview of the time series forecasting capabilities in AutoGluon.\n", "Specifically, we will cover:\n", "\n", "- What is probabilistic time series forecasting?\n", "- Forecasting time series with additional information\n", "- What data format is expected by `TimeSeriesPredictor`?\n", "- How to evaluate forecast accuracy?\n", "- Which forecasting models are available in AutoGluon?\n", "- What functionality does `TimeSeriesPredictor` offer?\n", " - Basic configuration with `presets` and `time_limit`\n", " - Manually selecting what models to train\n", " - Hyperparameter tuning\n", "\n", "This tutorial assumes that you are familiar with the contents of [Forecasting Time Series - Quick Start](forecasting-quick-start.ipynb).\n", "\n", "## What is probabilistic time series forecasting?\n", "A time series is a sequence of measurements made at regular intervals.\n", "The main objective of time series forecasting is to predict the future values of a time series given the past observations.\n", "A typical example of this task is demand forecasting.\n", "For example, we can represent the number of daily purchases of a certain product as a time series.\n", "The goal in this case could be predicting the demand for each of the next 14 days (i.e., the forecast horizon) given the historical purchase data.\n", "In AutoGluon, the `prediction_length` argument of the `TimeSeriesPredictor` determines the length of the forecast horizon.\n", "\n", "![Main goal of forecasting is to predict the future values of a time series given the past observations.](https://autogluon-timeseries-datasets.s3.us-west-2.amazonaws.com/public/figures/forecasting-indepth1.png)\n", "\n", "The objective of forecasting could be to predict future values of a given time series, as well as establishing prediction intervals within which the future values will likely lie.\n", "In AutoGluon, the `TimeSeriesPredictor` generates two types of forecasts:\n", "\n", "- **mean forecast** represents the expected value of the time series at each time step in the forecast horizon.\n", "- **quantile forecast** represents the quantiles of the forecast distribution.\n", "For example, if the `0.1` quantile (also known as P10, or the 10th percentile) is equal to `x`, it means that the time series value is predicted to be below `x` 10% of the time. As another example, the `0.5` quantile (P50) corresponds to the median forecast.\n", "Quantiles can be used to reason about the range of possible outcomes.\n", "For instance, by the definition of the quantiles, the time series is predicted to be between the P10 and P90 values with 80% probability.\n", "\n", "\n", "![Mean and quantile (P10 and P90) forecasts.](https://autogluon-timeseries-datasets.s3.us-west-2.amazonaws.com/public/figures/forecasting-indepth2.png)\n", "\n", "By default, the `TimeSeriesPredictor` outputs the quantiles `[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]`. Custom quantiles can be provided with the `quantile_levels` argument\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(quantile_levels=[0.05, 0.5, 0.95])\n", "```\n", "\n", "## Forecasting time series with additional information\n", "In real-world forecasting problems we often have access to additional information, beyond just the raw time series values.\n", "AutoGluon supports two types of such additional information: static features and time-varying covariates.\n", "\n", "```{note}\n", "Not all models available in AutoGluon support all types of features & covariates. For an overview, see [Forecasting Model Zoo / Additional features](forecasting-model-zoo.md#additional-features).\n", "```\n", "\n", "### Static features\n", "Static features are the time-independent attributes (metadata) of a time series.\n", "These may include information such as:\n", "\n", "- location, where the time series was recorded (country, state, city)\n", "- fixed properties of a product (brand name, color, size, weight)\n", "- store ID or product ID\n", "\n", "Providing this information may, for instance, help forecasting models generate similar demand forecasts for stores located in the same city.\n", "\n", "In AutoGluon, static features are stored as an attribute of a `TimeSeriesDataFrame` object.\n", "As an example, let's have a look at the M4 Daily dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "aa00faab-252f-44c9-b8f7-57131aa8251c", "metadata": { "tags": [ "remove-cell", "skip-execution" ] }, "outputs": [], "source": [ "# We use uv for faster installation\n", "!pip install uv\n", "!uv pip install -q autogluon.timeseries --system\n", "!uv pip uninstall -q torchaudio torchvision torchtext --system # fix incompatible package versions on Colab" ] }, { "cell_type": "code", "execution_count": null, "id": "9e18ec03-e804-4bba-9cc2-9923eaf2cce7", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(action=\"ignore\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b75c8851-1fb8-4464-9e13-e8124e4b7fa2", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2aaa2954-42ee-466f-8db2-2c0bf3678efd", "metadata": {}, "source": [ "We download a subset of 100 time series from the M4 Daily dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "f4f1d173-1fe6-4616-a55a-09a914e4ae57", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_daily_subset/train.csv\")\n", "df.head()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "94f6bac4-3eb3-4ba6-a051-46077dcc1e17", "metadata": {}, "source": [ "We also load the corresponding static features.\n", "In the M4 Daily dataset, there is a single categorical static feature that denotes the domain of origin for each time series.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "191776df-83a3-47a6-b6c2-6bd7c2383d6b", "metadata": {}, "outputs": [], "source": [ "static_features_df = pd.read_csv(\"https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_daily_subset/metadata.csv\")\n", "static_features_df.head()\n" ] }, { "cell_type": "markdown", "id": "3c9c4533", "metadata": {}, "source": [ "AutoGluon expects static features as a pandas.DataFrame object. The `item_id` column indicates which item (=individual time series) in `df` each row of `static_features` corresponds to.\n", "\n", "We can now create a `TimeSeriesDataFrame` that contains both the time series values and the static features." ] }, { "cell_type": "code", "execution_count": null, "id": "cd796511", "metadata": {}, "outputs": [], "source": [ "train_data = TimeSeriesDataFrame.from_data_frame(\n", " df,\n", " id_column=\"item_id\",\n", " timestamp_column=\"timestamp\",\n", " static_features_df=static_features_df,\n", ")\n", "train_data.head()\n" ] }, { "cell_type": "markdown", "id": "bfc8ba1a", "metadata": {}, "source": [ "We can validate that `train_data` now also includes the static features using the `.static_features` attribute" ] }, { "cell_type": "code", "execution_count": null, "id": "5c413a5e", "metadata": {}, "outputs": [], "source": [ "train_data.static_features.head()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b7688edf-21ba-4466-afdf-beff0ef23fb7", "metadata": {}, "source": [ "Alternatively, we can attach static features to an existing `TimeSeriesDataFrame` by assigning the `.static_features` attribute" ] }, { "cell_type": "code", "execution_count": null, "id": "d57f22ac", "metadata": {}, "outputs": [], "source": [ "train_data.static_features = static_features_df\n" ] }, { "cell_type": "markdown", "id": "3461dd93", "metadata": {}, "source": [ "\n", "If `static_features` doesn't contain some `item_id`s that are present in `train_data`, an exception will be raised.\n", "\n", "Now, when we fit the predictor, all models that support static features will automatically use the static features included in `train_data`.\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(prediction_length=14).fit(train_data)\n", "```\n", "\n", "```\n", "...\n", "Following types of static features have been inferred:\n", "\tcategorical: ['domain']\n", "\tcontinuous (float): []\n", "...\n", "```\n", "\n", "This message confirms that column `'domain'` was interpreted as a categorical feature.\n", "In general, AutoGluon-TimeSeries supports two types of static features:\n", "\n", "- `categorical`: columns of dtype `object`, `string` and `category` are interpreted as discrete categories\n", "- `continuous`: columns of dtype `int` and `float` are interpreted as continuous (real-valued) numbers\n", "- columns with other dtypes are ignored\n", "\n", "To override this logic, we need to manually change the columns dtype.\n", "For example, suppose the static features data frame contained an integer-valued column `\"store_id\"`.\n", "\n", "```python\n", "train_data.static_features[\"store_id\"] = list(range(len(train_data.item_ids)))\n", "```\n", "\n", "By default, this column will be interpreted as a continuous number.\n", "We can force AutoGluon to interpret it a a categorical feature by changing the dtype to `category`.\n", "\n", "```python\n", "train_data.static_features[\"store_id\"] = train_data.static_features[\"store_id\"].astype(\"category\")\n", "```\n", "\n", "**Note:** If training data contained static features, the predictor will expect that data passed to `predictor.predict()`, `predictor.leaderboard()`, and `predictor.evaluate()` also includes static features with the same column names and data types.\n", "\n", "\n", "### Time-varying covariates\n", "Covariates are the time-varying features that may influence the target time series.\n", "They are sometimes also referred to as dynamic features, exogenous regressors, or related time series.\n", "AutoGluon supports two types of covariates:\n", "\n", "- *known* covariates that are known for the entire forecast horizon, such as\n", " - holidays\n", " - day of the week, month, year\n", " - promotions\n", "\n", "- *past* covariates that are only known up to the start of the forecast horizon, such as\n", " - sales of other products\n", " - temperature, precipitation\n", " - transformed target time series\n", "\n", "\n", "![Target time series with one past covariate and one known covariate.](https://autogluon-timeseries-datasets.s3.us-west-2.amazonaws.com/public/figures/forecasting-indepth5.png)\n", "\n", "In AutoGluon, both `known_covariates` and `past_covariates` are stored as additional columns in the `TimeSeriesDataFrame`.\n", "\n", "We will again use the M4 Daily dataset as an example and generate both types of covariates:\n", "\n", "- a `past_covariate` equal to the logarithm of the target time series:\n", "- a `known_covariate` that equals to 1 if a given day is a weekend, and 0 otherwise." ] }, { "cell_type": "code", "execution_count": null, "id": "26472067-a3d3-44fb-be77-82b173bfac1c", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "train_data[\"log_target\"] = np.log(train_data[\"target\"])\n", "\n", "WEEKEND_INDICES = [5, 6]\n", "timestamps = train_data.index.get_level_values(\"timestamp\")\n", "train_data[\"weekend\"] = timestamps.weekday.isin(WEEKEND_INDICES).astype(float)\n", "\n", "train_data.head()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b8a3dc09-c2b1-4845-a56b-66a7d8906909", "metadata": {}, "source": [ "When creating the TimeSeriesPredictor, we specify that the column `\"target\"` is our prediction target, and the\n", "column `\"weekend\"` contains a covariate that will be known at prediction time.\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(\n", " prediction_length=14,\n", " target=\"target\",\n", " known_covariates_names=[\"weekend\"],\n", ").fit(train_data)\n", "```\n", "\n", "Predictor will automatically interpret the remaining columns (except target and known covariates) as past covariates.\n", "This information is logged during fitting:\n", "\n", "```\n", "...\n", "Provided dataset contains following columns:\n", "\ttarget: 'target'\n", "\tknown covariates: ['weekend']\n", "\tpast covariates: ['log_target']\n", "...\n", "```\n", "\n", "Finally, to make predictions, we generate the known covariates for the forecast horizon" ] }, { "cell_type": "code", "execution_count": null, "id": "876427f0", "metadata": {}, "outputs": [], "source": [ "predictor = TimeSeriesPredictor(prediction_length=14, freq=train_data.freq)\n", "\n", "known_covariates = predictor.make_future_data_frame(train_data)\n", "known_covariates[\"weekend\"] = known_covariates[\"timestamp\"].dt.weekday.isin(WEEKEND_INDICES).astype(float)\n", "\n", "known_covariates.head()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8b1cc93e-dc00-4475-9837-da8d0db5e2ef", "metadata": {}, "source": [ "Note that `known_covariates` must satisfy the following conditions:\n", "\n", "- The columns must include all columns listed in ``predictor.known_covariates_names``\n", "- The ``item_id`` index must include all item ids present in ``train_data``\n", "- The ``timestamp`` index must include the values for ``prediction_length`` many time steps into the future from the end of each time series in ``train_data``\n", "\n", "If `known_covariates` contain more information than necessary (e.g., contain additional columns, item_ids, or timestamps),\n", "AutoGluon will automatically select the necessary rows and columns.\n", "\n", "Finally, we pass the `known_covariates` to the `predict` function to generate predictions\n", "\n", "```python\n", "predictor.predict(train_data, known_covariates=known_covariates)\n", "```\n", "\n", "The list of models that support static features and covariates is available in [Forecasting Model Zoo](forecasting-model-zoo.md)." ] }, { "cell_type": "markdown", "id": "2e45f6ad", "metadata": {}, "source": [ "### Holidays\n", "Another popular example of `known_covariates` are holiday features. In this section we describe how to add holiday features to a time series dataset and use them in AutoGluon.\n", "\n", "First, we need to define a dictionary with dates in `datetime.date` format as keys and holiday names as values. \n", "We can easily generate such a dictionary using the [`holidays`](https://pypi.org/project/holidays/) Python package." ] }, { "cell_type": "code", "execution_count": null, "id": "1f205c02", "metadata": {}, "outputs": [], "source": [ "!pip install -q holidays" ] }, { "cell_type": "markdown", "id": "147dd5c9", "metadata": {}, "source": [ "Here we use German holidays for demonstration purposes only. Make sure to define a holiday calendar that matches your country/region!" ] }, { "cell_type": "code", "execution_count": null, "id": "73d706de", "metadata": {}, "outputs": [], "source": [ "import holidays\n", "\n", "timestamps = train_data.index.get_level_values(\"timestamp\")\n", "country_holidays = holidays.country_holidays(\n", " country=\"DE\", # make sure to select the correct country/region!\n", " # Add + 1 year to make sure that holidays are initialized for the forecast horizon\n", " years=range(timestamps.min().year, timestamps.max().year + 1),\n", ")\n", "# Convert dict to pd.Series for pretty visualization\n", "pd.Series(country_holidays).sort_index().head()" ] }, { "cell_type": "markdown", "id": "49e5709d", "metadata": {}, "source": [ "Alternatively, we can manually define a dictionary with custom holidays." ] }, { "cell_type": "code", "execution_count": null, "id": "a82bc60d", "metadata": {}, "outputs": [], "source": [ "import datetime\n", "\n", "# must cover the full train time range + forecast horizon\n", "custom_holidays = {\n", " datetime.date(1995, 1, 29): \"Superbowl\",\n", " datetime.date(1995, 11, 29): \"Black Friday\",\n", " datetime.date(1996, 1, 28): \"Superbowl\",\n", " datetime.date(1996, 11, 29): \"Black Friday\",\n", " # ...\n", "}" ] }, { "cell_type": "markdown", "id": "a4e6eca1", "metadata": {}, "source": [ "Next, we define a method that adds holiday features as columns to a `TimeSeriesDataFrame`." ] }, { "cell_type": "code", "execution_count": null, "id": "c8533c18", "metadata": {}, "outputs": [], "source": [ "def add_holiday_features(\n", " ts_df: TimeSeriesDataFrame,\n", " country_holidays: dict,\n", " include_individual_holidays: bool = True,\n", " include_holiday_indicator: bool = True,\n", ") -> TimeSeriesDataFrame:\n", " \"\"\"Add holiday indicator columns to a TimeSeriesDataFrame.\"\"\"\n", " ts_df = ts_df.copy()\n", " if not isinstance(ts_df, TimeSeriesDataFrame):\n", " ts_df = TimeSeriesDataFrame(ts_df)\n", " timestamps = ts_df.index.get_level_values(\"timestamp\")\n", " country_holidays_df = pd.get_dummies(pd.Series(country_holidays)).astype(float)\n", " holidays_df = country_holidays_df.reindex(timestamps.date).fillna(0)\n", " if include_individual_holidays:\n", " ts_df[holidays_df.columns] = holidays_df.values\n", " if include_holiday_indicator:\n", " ts_df[\"Holiday\"] = holidays_df.max(axis=1).values\n", " return ts_df" ] }, { "cell_type": "markdown", "id": "2066d120", "metadata": {}, "source": [ "We can create a single indicator feature for all holidays." ] }, { "cell_type": "code", "execution_count": null, "id": "569141ab", "metadata": {}, "outputs": [], "source": [ "add_holiday_features(train_data, country_holidays, include_individual_holidays=False).head()" ] }, { "cell_type": "markdown", "id": "eeda2a50", "metadata": {}, "source": [ "Or represent each holiday with a separate feature." ] }, { "cell_type": "code", "execution_count": null, "id": "5a813e80", "metadata": {}, "outputs": [], "source": [ "train_data_with_holidays = add_holiday_features(train_data, country_holidays)\n", "train_data_with_holidays.head()" ] }, { "cell_type": "markdown", "id": "b52f4e02", "metadata": {}, "source": [ "Remember to add the names of holiday features as `known_covariates_names` when creating `TimeSeriesPredictor`.\n", "\n", "```python\n", "holiday_columns = train_data_with_holidays.columns.difference(train_data.columns)\n", "predictor = TimeSeriesPredictor(..., known_covariates_names=holiday_columns).fit(train_data_with_holidays, ...)\n", "```\n", "\n", "At prediction time, we need to provide future holiday values as `known_covariates`." ] }, { "cell_type": "code", "execution_count": null, "id": "5d63cbb8", "metadata": {}, "outputs": [], "source": [ "known_covariates = predictor.make_future_data_frame(train_data)\n", "known_covariates = add_holiday_features(known_covariates, country_holidays)\n", "known_covariates.head()" ] }, { "cell_type": "markdown", "id": "1c3d7e7d", "metadata": {}, "source": [ "```python\n", "predictions = predictor.predict(train_data_with_holidays, known_covariates=known_covariates)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fd51f9a2", "metadata": {}, "source": [ "## What data format is expected by `TimeSeriesPredictor`?\n", "\n", "AutoGluon expects that at least some time series in the training data are long enough to generate an internal validation set.\n", "\n", "This means, at least some time series in `train_data` must have length `>= max(prediction_length + 1, 5) + prediction_length` when training with default settings\n", "```python\n", "predictor = TimeSeriesPredictor(prediction_length=prediction_length).fit(train_data)\n", "```\n", "\n", "If you use advanced configuration options, such as following,\n", "```python\n", "predictor = TimeSeriesPredictor(prediction_length=prediction_length).fit(train_data, num_val_windows=num_val_windows, val_step_size=val_step_size)\n", "```\n", "then at least some time series in `train_data` must have length `>= max(prediction_length + 1, 5) + prediction_length + (num_val_windows - 1) * val_step_size`.\n", "\n", "Note that all time series in the dataset can have different lengths.\n", "\n", "\n", "### Handling irregular data and missing values \n", "In some applications, like finance, data often comes with irregular measurements (e.g., no stock price is available for weekends or holidays) or missing values.\n", "\n", "Here is an example of a dataset with an irregular time index:" ] }, { "cell_type": "code", "execution_count": null, "id": "1f7cbfd4-bdbd-4b4a-86b7-bba7d038ee51", "metadata": {}, "outputs": [], "source": [ "df_irregular = TimeSeriesDataFrame(\n", " pd.DataFrame(\n", " {\n", " \"item_id\": [0, 0, 0, 1, 1],\n", " \"timestamp\": [\"2022-01-01\", \"2022-01-02\", \"2022-01-04\", \"2022-01-01\", \"2022-01-04\"],\n", " \"target\": [1, 2, 3, 4, 5],\n", " }\n", " )\n", ")\n", "df_irregular\n" ] }, { "cell_type": "markdown", "id": "42cb2316", "metadata": {}, "source": [ "In such case, you can specify the desired frequency when creating the predictor using the `freq` argument.\n", "```python\n", "predictor = TimeSeriesPredictor(..., freq=\"D\").fit(df_irregular)\n", "```\n", "Here we choose `freq=\"D\"` to indicate that the filled index must have a daily frequency\n", "(see [other possible choices in pandas documentation](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases)).\n", "\n", "AutoGluon will automatically convert the irregular data into daily frequency and deal with missing values." ] }, { "attachments": {}, "cell_type": "markdown", "id": "7c3d3b86-1348-4b74-a05d-c80cb403bde6", "metadata": {}, "source": [ "--------\n", "Alternatively, we can manually fill the gaps in the time index using the method [TimeSeriesDataFrame.convert_frequency()](../../api/autogluon.timeseries.TimeSeriesDataFrame.convert_frequency.rst)." ] }, { "cell_type": "code", "execution_count": null, "id": "4e7f0ab1-a056-4b5d-a912-1a1008044544", "metadata": {}, "outputs": [], "source": [ "df_regular = df_irregular.convert_frequency(freq=\"D\")\n", "df_regular\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "67d16c86-b327-43f5-9a34-b4250d353287", "metadata": {}, "source": [ "We can verify that the index is now regular and has a daily frequency" ] }, { "cell_type": "code", "execution_count": null, "id": "50c58d1b-445d-4a92-a16e-55fd0b783953", "metadata": {}, "outputs": [], "source": [ "print(f\"Data has frequency '{df_regular.freq}'\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8b3c9bdd-095d-4430-966d-a17e3989a789", "metadata": {}, "source": [ "Now the data contains missing values represented by `NaN`. Most time series models in AutoGluon can natively deal with missing values, so we can just pass data to the `TimeSeriesPredictor`." ] }, { "cell_type": "markdown", "id": "ef2ce5a4", "metadata": {}, "source": [ "Alternatively, we can manually fill the NaNs with an appropriate strategy using [TimeSeriesDataFrame.fill_missing_values()](../../api/autogluon.timeseries.TimeSeriesDataFrame.fill_missing_values.rst).\n", "By default, missing values are filled with a combination of forward + backward filling." ] }, { "cell_type": "code", "execution_count": null, "id": "1440ea27-ff0e-4cda-90a2-0f5021f78514", "metadata": {}, "outputs": [], "source": [ "df_filled = df_regular.fill_missing_values()\n", "df_filled" ] }, { "cell_type": "markdown", "id": "c1d24fb9", "metadata": {}, "source": [ "In some applications such as demand forecasting, missing values may correspond to zero demand. In this case constant fill is more appropriate." ] }, { "cell_type": "code", "execution_count": null, "id": "ad7791b1", "metadata": {}, "outputs": [], "source": [ "df_filled = df_regular.fill_missing_values(method=\"constant\", value=0.0)\n", "df_filled" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f1f2bc5", "metadata": {}, "source": [ "## How to evaluate forecast accuracy?\n", "\n", "To measure how accurately `TimeSeriesPredictor` can forecast unseen time series, we need to reserve some test data that won't be used for training.\n", "This can be easily done using the `train_test_split` method of a `TimeSeriesDataFrame`:" ] }, { "cell_type": "code", "execution_count": null, "id": "a989f8e2", "metadata": {}, "outputs": [], "source": [ "prediction_length = 48\n", "data = TimeSeriesDataFrame.from_path(\"https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_hourly_subset/train.csv\")\n", "train_data, test_data = data.train_test_split(prediction_length)\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "20bae385", "metadata": {}, "source": [ "We obtained two `TimeSeriesDataFrame`s from our original data:\n", "- `test_data` contains exactly the same data as the original `data` (i.e., it contains both historical data and the forecast horizon)\n", "- In `train_data`, the last `prediction_length` time steps are removed from the end of each time series (i.e., it contains only historical data)" ] }, { "cell_type": "code", "execution_count": null, "id": "5fb03059", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "item_id = \"H1\"\n", "fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=[10, 4], sharex=True)\n", "train_ts = train_data.loc[item_id]\n", "test_ts = test_data.loc[item_id]\n", "ax1.set_title(\"Train data (past time series values)\")\n", "ax1.plot(train_ts)\n", "ax2.set_title(\"Test data (past + future time series values)\")\n", "ax2.plot(test_ts)\n", "for ax in (ax1, ax2):\n", " ax.fill_between(np.array([train_ts.index[-1], test_ts.index[-1]]), test_ts.min(), test_ts.max(), color=\"C1\", alpha=0.3, label=\"Forecast horizon\")\n", "plt.legend()\n", "plt.show()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fa280ab3", "metadata": {}, "source": [ "We can now use `train_data` to train the predictor, and `test_data` to obtain an estimate of its performance on unseen data.\n", "```python\n", "predictor = TimeSeriesPredictor(prediction_length=prediction_length, eval_metric=\"MASE\").fit(train_data)\n", "predictor.evaluate(test_data)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e3cb1a58", "metadata": {}, "source": [ "AutoGluon evaluates the performance of forecasting models by measuring how well their forecasts align with the actually observed time series.\n", "For each time series in `test_data`, the predictor does the following:\n", "\n", "1. Hold out the last `prediction_length` values of the time series.\n", "2. Generate a forecast for the held out part of the time series, i.e., the forecast horizon.\n", "3. Quantify how well the forecast matches the actually observed (held out) values of the time series using the `eval_metric`.\n", "\n", "Finally, the scores are averaged over all time series in the dataset.\n", "\n", "The crucial detail here is that `evaluate` always computes the score on the last `prediction_length` time steps of each time series.\n", "The beginning of each time series (except the last `prediction_length` time steps) is only used to initialize the models before forecasting.\n", "\n", "For more details about the evaluation metrics, see [Forecasting Evaluation Metrics](forecasting-metrics.md)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "f7c5cb98", "metadata": {}, "source": [ "### Backtesting using multiple cutoffs\n", "\n", "We can more accurately estimate the performance using **backtest** (i.e., evaluate performance on multiple forecast horizons generated from the same time series).\n", "\n", "This can be done using the `cutoff` argument to the [`evaluate`](https://auto.gluon.ai/stable/api/autogluon.timeseries.TimeSeriesPredictor.evaluate.html) method.\n", "\n", "```python\n", "num_val_windows = 3\n", "for cutoff in range(-num_val_windows * prediction_length, 0, step=prediction_length):\n", " score = predictor.evaluate(test_data, cutoff=cutoff)\n", " print(f\"Cutoff {cutoff}: score = {score}\")\n", "```\n", "\n", "The `evaluate` method will measure the forecast accuracy using the `prediction_length` time steps after the `cutoff` index as a hold-out set (marked in orange). By default (if no `cutoff` is provided), the cutoff value will be set to `-1 * prediction_length`.\n", "\n", "![By choosing different `cutoff` values we can evaluate the model on different splits. For each split, the forecast accuracy is evaluated on the `prediction_length` time steps (orange) after the `cutoff`.](https://autogluon-timeseries-datasets.s3.us-west-2.amazonaws.com/public/figures/forecasting-indepth7.png)\n", "\n", "If you want to evaluate multiple models at once, you can similarly provide different `cutoff` values to the [`leaderboard`](https://auto.gluon.ai/stable/api/autogluon.timeseries.TimeSeriesPredictor.leaderboard.html) method.\n", "\n", "Multi-window backtesting typically results in more accurate estimation of the forecast quality on unseen data.\n", "However, this strategy decreases the amount of training data available for fitting models, so we recommend using single-window backtesting if the training time series are short." ] }, { "attachments": {}, "cell_type": "markdown", "id": "68f7f14d", "metadata": {}, "source": [ "### How does AutoGluon perform validation?\n", "When we fit the predictor with `predictor.fit(train_data=train_data)`, under the hood AutoGluon further splits the original dataset `train_data` into train and validation parts.\n", "\n", "Performance of different models on the validation set is evaluated using the `evaluate` method, just like described above.\n", "The model that achieves the best validation score will be used for prediction in the end.\n", "\n", "By default, the internal validation set contains a single window containing the last `prediction_length` time steps of each time series. We can increase the number of validation windows using the `num_val_windows` argument.\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(...)\n", "predictor.fit(train_data, num_val_windows=3)\n", "```\n", "This will reduce the likelihood of overfitting but will increase the training time approximately by a factor of `num_val_windows`.\n", "Note that multiple validation windows can only be used if the time series in `train_data` have length of at least `(num_val_windows + 1) * prediction_length`.\n", "\n", "Alternatively, a user can provide their own validation set to the `fit` method. In this case it's important to remember that the validation score is computed on the last `prediction_length` time steps of each time series.\n", "\n", "```python\n", "predictor.fit(train_data=train_data, tuning_data=my_validation_dataset)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Which forecasting models are available in AutoGluon?\n", "Forecasting models in AutoGluon can be divided into three broad categories: local, global, and ensemble models.\n", "\n", "**Local models** are simple statistical models that are specifically designed to capture patterns such as trend or seasonality.\n", "Despite their simplicity, these models often produce reasonable forecasts and serve as a strong baseline.\n", "Some examples of available local models:\n", "\n", "- `ETS`\n", "- `AutoARIMA`\n", "- `Theta`\n", "- `SeasonalNaive`\n", "\n", "If the dataset consists of multiple time series, we fit a separate local model to each time series — hence the name \"local\".\n", "This means, if we want to make a forecast for a new time series that wasn't part of the training set, all local models will be fit from scratch for the new time series.\n", "\n", "**Global models** are machine learning algorithms that learn a single model from the entire training set consisting of multiple time series.\n", "Most global models in AutoGluon are provided by the [GluonTS](https://ts.gluon.ai/stable/) library.\n", "These are neural-network algorithms implemented in PyTorch, such as:\n", "\n", "- `DeepAR`\n", "- `PatchTST`\n", "- `DLinear`\n", "- `TemporalFusionTransformer`\n", "\n", "This category also includes pre-trained zero-shot forecasting models like [Chronos](forecasting-chronos.ipynb).\n", "\n", "AutoGluon also offers two tabular global models `RecursiveTabular` and `DirectTabular`.\n", "Under the hood, these models convert the forecasting task into a regression problem and use a [TabularPredictor](../../api/autogluon.tabular.TabularPredictor.rst) to fit regression algorithms like LightGBM.\n", "\n", "Finally, an **ensemble** model works by combining predictions of all other models.\n", "By default, `TimeSeriesPredictor` always fits a `WeightedEnsemble` on top of other models.\n", "This can be disabled by setting `enable_ensemble=False` when calling the `fit` method.\n", "\n", "For a list of tunable hyperparameters for each model, their default values, and other details see [Forecasting Model Zoo](forecasting-model-zoo.md)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "843edca6", "metadata": {}, "source": [ "\n", "## What functionality does `TimeSeriesPredictor` offer?\n", "AutoGluon offers multiple ways to configure the behavior of a `TimeSeriesPredictor` that are suitable for both beginners and expert users.\n", "\n", "### Basic configuration with `presets` and `time_limit`\n", "We can fit `TimeSeriesPredictor` with different pre-defined configurations using the `presets` argument of the `fit` method.\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(...)\n", "predictor.fit(train_data, presets=\"medium_quality\")\n", "```\n", "\n", "Higher quality presets usually result in better forecasts but take longer to train.\n", "The following presets are available:\n", "\n", "| Preset | Description | Use Cases | Fit Time (Ideal) | \n", "| :------------- | :----------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------ | :--------------- | \n", "| `fast_training` | Fit simple statistical and baseline models + fast tree-based models | Fast to train but may not be very accurate | 0.5x |\n", "| `medium_quality` | Same models as in `fast_training` + deep learning model `TemporalFusionTransformer` + Chronos-Bolt (small) | Good forecasts with reasonable training time | 1x |\n", "| `high_quality` | More powerful deep learning, machine learning, statistical and pretrained forecasting models | Much more accurate than ``medium_quality``, but takes longer to train | 3x |\n", "| `best_quality` | Same models as in `high_quality`, more cross-validation windows | Typically more accurate than `high_quality`, especially for datasets with few (<50) time series | 6x |\n", "\n", "You can find more information about the [presets](https://github.com/autogluon/autogluon/blob/stable/timeseries/src/autogluon/timeseries/configs/presets_configs.py) and the [models includes in each preset](https://github.com/autogluon/autogluon/blob/stable/timeseries/src/autogluon/timeseries/models/presets.py#L109) in the AutoGluon source code.\n", "\n", "Another way to control the training time is using the `time_limit` argument.\n", "\n", "```python\n", "predictor.fit(\n", " train_data,\n", " time_limit=60 * 60, # total training time in seconds\n", ")\n", "```\n", "\n", "If no `time_limit` is provided, the predictor will train until all models have been fit.\n", "\n", "\n", "### Manually configuring models\n", "Advanced users can override the presets and manually specify what models should be trained by the predictor using the `hyperparameters` argument.\n", "\n", "```python\n", "predictor = TimeSeriesPredictor(...)\n", "\n", "predictor.fit(\n", " ...\n", " hyperparameters={\n", " \"DeepAR\": {},\n", " \"Theta\": [\n", " {\"decomposition_type\": \"additive\"},\n", " {\"seasonal_period\": 1},\n", " ],\n", " }\n", ")\n", "```\n", "\n", "The above example will train three models:\n", "\n", "* ``DeepAR`` with default hyperparameters\n", "* ``Theta`` with additive seasonal decomposition (all other parameters set to their defaults)\n", "* ``Theta`` with seasonality disabled (all other parameters set to their defaults)\n", "\n", "You can also exclude certain models from the presets using the `excluded_model_type` argument.\n", "```python\n", "predictor.fit(\n", " ...\n", " presets=\"high_quality\",\n", " excluded_model_types=[\"AutoETS\", \"AutoARIMA\"],\n", ")\n", "```\n", "\n", "For the full list of available models and the respective hyperparameters, see [Forecasting Model Zoo](forecasting-model-zoo.md).\n", "\n", "### Hyperparameter tuning\n", "\n", "Advanced users can define search spaces for model hyperparameters and let AutoGluon automatically determine the best configuration for the model.\n", "\n", "```python\n", "from autogluon.common import space\n", "\n", "predictor = TimeSeriesPredictor()\n", "\n", "predictor.fit(\n", " train_data,\n", " hyperparameters={\n", " \"DeepAR\": {\n", " \"hidden_size\": space.Int(20, 100),\n", " \"dropout_rate\": space.Categorical(0.1, 0.3),\n", " },\n", " },\n", " hyperparameter_tune_kwargs=\"auto\",\n", " enable_ensemble=False,\n", ")\n", "```\n", "\n", "This code will train multiple versions of the `DeepAR` model with 10 different hyperparameter configurations.\n", "AutGluon will automatically select the best model configuration that achieves the highest validation score and use it for prediction.\n", "\n", "Currently, HPO is based on Ray Tune for deep learning models from GluonTS, and random search for all other time series models.\n", "\n", "We can change the number of random search trials per model by passing a dictionary as `hyperparameter_tune_kwargs`\n", "\n", "```python\n", "predictor.fit(\n", " ...\n", " hyperparameter_tune_kwargs={\n", " \"num_trials\": 20,\n", " \"scheduler\": \"local\",\n", " \"searcher\": \"random\",\n", " },\n", " ...\n", ")\n", "```\n", "\n", "The `hyperparameter_tune_kwargs` dict must include the following keys:\n", "\n", "- ``\"num_trials\"``: int, number of configurations to train for each tuned model\n", "- ``\"searcher\"``: currently, the only supported option is ``\"random\"`` (random search).\n", "- ``\"scheduler\"``: currently, the only supported option is ``\"local\"`` (all models trained on the same machine)\n", "\n", "**Note:** HPO significantly increases the training time for most models, but often provides only modest performance gains." ] } ], "metadata": { "kernelspec": { "display_name": "ag", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 5 }