autogluon.timeseries.TimeSeriesDataFrame

class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]

A collection of univariate time series, where each row is identified by an (item_id, timestamp) pair.

For example, a time series data frame could represent the daily sales of a collection of products, where each item_id corresponds to a product and timestamp corresponds to the day of the record.

Parameters:
  • data (pd.DataFrame, str, pathlib.Path or Iterable) –

    Time series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

    1. Time series data in a pandas DataFrame format without multi-index. For example:

         item_id  timestamp  target
      0        0 2019-01-01       0
      1        0 2019-01-02       1
      2        0 2019-01-03       2
      3        1 2019-01-01       3
      4        1 2019-01-02       4
      5        1 2019-01-03       5
      6        2 2019-01-01       6
      7        2 2019-01-02       7
      8        2 2019-01-03       8
      

    You can also use from_data_frame() for loading data in such format.

    1. Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also use from_path() for loading data in such format.

    2. Time series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                          target
      item_id timestamp
      0       2019-01-01       0
              2019-01-02       1
              2019-01-03       2
      1       2019-01-01       3
              2019-01-02       4
              2019-01-03       5
      2       2019-01-01       6
              2019-01-02       7
              2019-01-03       8
      
    3. Time series data in Iterable format. For example:

      iterable_dataset = [
          {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')},
          {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')},
          {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')}
      ]
      

    You can also use from_iterable_dataset() for loading data in such format.

  • static_features (pd.DataFrame, str or pathlib.Path, optional) –

    An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if TimeSeriesDataFrame contains sales of various products, static features may refer to time-independent features like color or brand.

    The index of the static_features index must contain a single entry for each item present in the respective TimeSeriesDataFrame. For example, the following TimeSeriesDataFrame:

                        target
    item_id timestamp
    A       2019-01-01       0
            2019-01-02       1
            2019-01-03       2
    B       2019-01-01       3
            2019-01-02       4
            2019-01-03       5
    

    is compatible with the following static_features:

             feat_1 feat_2
    item_id
    A           2.0    bar
    B           5.0    foo
    

    TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations.

    If static_features are provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time.

  • id_column (str, optional) – Name of the item_id column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).

  • timestamp_column (str, optional) – Name of the timestamp column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).

  • num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).

freq

A pandas-compatible string describing the frequency of the time series. For example "D" for daily data, "h" for hourly data, etc. This attribute is determined automatically based on the timestamps. For the full list of possible values, see pandas documentation.

Type:

str

num_items

Number of items (time series) in the data set.

Type:

int

item_ids

List of unique time series IDs contained in the data set.

Type:

pd.Index

__init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]

Methods

convert_frequency

Convert each time series in the data frame to the given frequency.

copy

Make a copy of the TimeSeriesDataFrame.

dropna

Drop rows containing NaNs.

fill_missing_values

Fill missing values represented by NaN.

from_data_frame

Construct a TimeSeriesDataFrame from a pandas DataFrame.

from_iterable_dataset

Construct a TimeSeriesDataFrame from an Iterable of dictionaries each of which represent a single time series.

from_path

Construct a TimeSeriesDataFrame from a CSV or Parquet file.

from_pickle

Convenience method to read pickled time series data frames.

get_model_inputs_for_scoring

Prepare model inputs necessary to predict the last prediction_length time steps of each time series in the dataset.

infer_frequency

Infer the time series frequency based on the timestamps of the observations.

num_timesteps_per_item

Length of each time series in the dataframe.

slice_by_time

Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.

slice_by_timestep

Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.

split_by_time

Split dataframe to two different TimeSeriesDataFrame s before and after a certain cutoff_time.

to_data_frame

Convert TimeSeriesDataFrame to a pandas.DataFrame

train_test_split

Generate a train/test split from the given dataset.

Attributes

freq

item_ids

num_items

static_features