autogluon.timeseries.TimeSeriesDataFrame¶

A collection of univariate time series, where each row is identified by an (item_id, timestamp) pair.

For example, a time series dataframe could represent the daily sales of a collection of products, where each item_id corresponds to a product and timestamp corresponds to the day of the record.

Parameters:

data (pd.DataFrame, str, pathlib.Path or Iterable) –

Time series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

Time series data in a pandas DataFrame format without multi-index. For example:

   item_id  timestamp  target
      0 2019-01-01       0
      0 2019-01-02       1
      0 2019-01-03       2
      1 2019-01-01       3
      1 2019-01-02       4
      1 2019-01-03       5
      2 2019-01-01       6
      2 2019-01-02       7
      2 2019-01-03       8

You can also use from_data_frame() for loading data in such format.

Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also use from_path() for loading data in such format.

Time series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                    target
item_id timestamp
0       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
1       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
2       2019-01-01       6
        2019-01-02       7
        2019-01-03       8

Time series data in Iterable format. For example:

iterable_dataset = [
    {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')},
    {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')},
    {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')}
]

You can also use from_iterable_dataset() for loading data in such format.

static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional dataframe describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if TimeSeriesDataFrame contains sales of various products, static features may refer to time-independent features like color or brand.

The index of the static_features index must contain a single entry for each item present in the respective TimeSeriesDataFrame. For example, the following TimeSeriesDataFrame:
```
                    target
item_id timestamp
A       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
B       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
```
is compatible with the following static_features:
```
         feat_1 feat_2
item_id
A           2.0    bar
B           5.0    foo
```
TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations.

If static_features are provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time.
id_column (str, optional) – Name of the item_id column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
timestamp_column (str, optional) – Name of the timestamp column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).

__init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶

Methods

`convert_frequency`	Convert each time series in the dataframe to the given frequency.
`copy`	Make a copy of the TimeSeriesDataFrame.
`dropna`	Drop rows containing NaNs.
`fill_missing_values`	Fill missing values represented by NaN.
`from_data_frame`	Construct a `TimeSeriesDataFrame` from a pandas DataFrame.
`from_iterable_dataset`	Construct a `TimeSeriesDataFrame` from an Iterable of dictionaries each of which represent a single time series.
`from_path`	Construct a `TimeSeriesDataFrame` from a CSV or Parquet file.
`from_pickle`	Convenience method to read pickled time series dataframes.
`get_model_inputs_for_scoring`	Prepare model inputs necessary to predict the last `prediction_length` time steps of each time series in the dataset.
`infer_frequency`	Infer the time series frequency based on the timestamps of the observations.
`num_timesteps_per_item`	Number of observations in each time series in the dataframe.
`slice_by_time`	Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
`slice_by_timestep`	Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
`split_by_time`	Split dataframe to two different `TimeSeriesDataFrame` s before and after a certain `cutoff_time`.
`to_data_frame`	Convert `TimeSeriesDataFrame` to a `pandas.DataFrame`
`train_test_split`	Generate a train/test split from the given dataset.

Attributes

`freq`	Inferred pandas-compatible frequency of the timestamps in the dataframe.
`item_ids`	List of unique time series IDs contained in the data set.
`num_items`	Number of items (time series) in the data set.
`static_features`