autogluon.timeseries.TimeSeriesDataFrame¶

A collection of univariate time series, where each row is identified by an (item_id, timestamp) pair.

For example, a time series data frame could represent the daily sales of a collection of products, where each item_id corresponds to a product and timestamp corresponds to the day of the record.

Parameters:

data (pd.DataFrame, str, pathlib.Path or Iterable) –

Time series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

Time series data in a pandas DataFrame format without multi-index. For example:

   item_id  timestamp  target
      0 2019-01-01       0
      0 2019-01-02       1
      0 2019-01-03       2
      1 2019-01-01       3
      1 2019-01-02       4
      1 2019-01-03       5
      2 2019-01-01       6
      2 2019-01-02       7
      2 2019-01-03       8

You can also use from_data_frame() for loading data in such format.

Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also use from_path() for loading data in such format.

Time series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                    target
item_id timestamp
0       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
1       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
2       2019-01-01       6
        2019-01-02       7
        2019-01-03       8

Time series data in Iterable format. For example:

iterable_dataset = [
    {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')},
    {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')},
    {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')}
]

You can also use from_iterable_dataset() for loading data in such format.

static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if TimeSeriesDataFrame contains sales of various products, static features may refer to time-independent features like color or brand.

The index of the static_features index must contain a single entry for each item present in the respective TimeSeriesDataFrame. For example, the following TimeSeriesDataFrame:
```
                    target
item_id timestamp
A       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
B       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
```
is compatible with the following static_features:
```
         feat_1 feat_2
item_id
A           2.0    bar
B           5.0    foo
```
TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations.

If static_features are provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time.
id_column (str, optional) – Name of the item_id column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
timestamp_column (str, optional) – Name of the timestamp column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).

freq¶

A pandas-compatible string describing the frequency of the time series. For example "D" for daily data, "h" for hourly data, etc. This attribute is determined automatically based on the timestamps. For the full list of possible values, see pandas documentation.

Type:: str

num_items¶

Number of items (time series) in the data set.

Type:: int

item_ids¶

List of unique time series IDs contained in the data set.

Type:: pd.Index

__init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶

Methods

`convert_frequency`	Convert each time series in the data frame to the given frequency.
`copy`	Make a copy of the TimeSeriesDataFrame.
`dropna`	Drop rows containing NaNs.
`fill_missing_values`	Fill missing values represented by NaN.
`from_data_frame`	Construct a `TimeSeriesDataFrame` from a pandas DataFrame.
`from_iterable_dataset`	Construct a `TimeSeriesDataFrame` from an Iterable of dictionaries each of which represent a single time series.
`from_path`	Construct a `TimeSeriesDataFrame` from a CSV or Parquet file.
`from_pickle`	Convenience method to read pickled time series data frames.
`get_model_inputs_for_scoring`	Prepare model inputs necessary to predict the last `prediction_length` time steps of each time series in the dataset.
`infer_frequency`	Infer the time series frequency based on the timestamps of the observations.
`num_timesteps_per_item`	Length of each time series in the dataframe.
`slice_by_time`	Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
`slice_by_timestep`	Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
`split_by_time`	Split dataframe to two different `TimeSeriesDataFrame` s before and after a certain `cutoff_time`.
`to_data_frame`	Convert TimeSeriesDataFrame to a pandas.DataFrame
`train_test_split`	Generate a train/test split from the given dataset.

Attributes

`freq`
`item_ids`
`num_items`
`static_features`