autogluon.timeseries.TimeSeriesDataFrame

class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]

A collection of univariate time series, where each row is identified by an (item_id, timestamp) pair.

For example, a time series dataframe could represent the daily sales of a collection of products, where each item_id corresponds to a product and timestamp corresponds to the day of the record.

Parameters:
  • data (pd.DataFrame, str, pathlib.Path or Iterable) –

    Time series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

    1. Time series data in a pandas DataFrame format without multi-index. For example:

         item_id  timestamp  target
      0        0 2019-01-01       0
      1        0 2019-01-02       1
      2        0 2019-01-03       2
      3        1 2019-01-01       3
      4        1 2019-01-02       4
      5        1 2019-01-03       5
      6        2 2019-01-01       6
      7        2 2019-01-02       7
      8        2 2019-01-03       8
      

    You can also use from_data_frame() for loading data in such format.

    1. Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also use from_path() for loading data in such format.

    2. Time series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                          target
      item_id timestamp
      0       2019-01-01       0
              2019-01-02       1
              2019-01-03       2
      1       2019-01-01       3
              2019-01-02       4
              2019-01-03       5
      2       2019-01-01       6
              2019-01-02       7
              2019-01-03       8
      
    3. Time series data in Iterable format. For example:

      iterable_dataset = [
          {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')},
          {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')},
          {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')}
      ]
      

    You can also use from_iterable_dataset() for loading data in such format.

  • static_features (pd.DataFrame, str or pathlib.Path, optional) –

    An optional dataframe describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if TimeSeriesDataFrame contains sales of various products, static features may refer to time-independent features like color or brand.

    The index of the static_features index must contain a single entry for each item present in the respective TimeSeriesDataFrame. For example, the following TimeSeriesDataFrame:

                        target
    item_id timestamp
    A       2019-01-01       0
            2019-01-02       1
            2019-01-03       2
    B       2019-01-01       3
            2019-01-02       4
            2019-01-03       5
    

    is compatible with the following static_features:

             feat_1 feat_2
    item_id
    A           2.0    bar
    B           5.0    foo
    

    TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations.

    If static_features are provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time.

  • id_column (str, optional) – Name of the item_id column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).

  • timestamp_column (str, optional) – Name of the timestamp column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).

  • num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).

__init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]

Methods

convert_frequency

Convert each time series in the dataframe to the given frequency.

copy

Make a copy of the TimeSeriesDataFrame.

dropna

Drop rows containing NaNs.

fill_missing_values

Fill missing values represented by NaN.

from_data_frame

Construct a TimeSeriesDataFrame from a pandas DataFrame.

from_iterable_dataset

Construct a TimeSeriesDataFrame from an Iterable of dictionaries each of which represent a single time series.

from_path

Construct a TimeSeriesDataFrame from a CSV or Parquet file.

from_pickle

Convenience method to read pickled time series dataframes.

get_model_inputs_for_scoring

Prepare model inputs necessary to predict the last prediction_length time steps of each time series in the dataset.

infer_frequency

Infer the time series frequency based on the timestamps of the observations.

num_timesteps_per_item

Number of observations in each time series in the dataframe.

slice_by_time

Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.

slice_by_timestep

Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.

split_by_time

Split dataframe to two different TimeSeriesDataFrame s before and after a certain cutoff_time.

to_data_frame

Convert TimeSeriesDataFrame to a pandas.DataFrame

train_test_split

Generate a train/test split from the given dataset.

Attributes

freq

Inferred pandas-compatible frequency of the timestamps in the dataframe.

item_ids

List of unique time series IDs contained in the data set.

num_items

Number of items (time series) in the data set.

static_features