autogluon.timeseries.TimeSeriesDataFrame#

class autogluon.timeseries.TimeSeriesDataFrame(data: Any, static_features: Optional[DataFrame] = None, *args, **kwargs)[source]#

TimeSeriesDataFrame s represent a collection of time series, where each row identifies the values of an (item_id, timestamp) pair.

For example, a time series data frame could represent the daily sales of a collection of products, where each item_id identifies a product and timestamp s correspond to the days.

Parameters

data (Any) –

Time-series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

Time-series data in a pandas DataFrame format without multi-index. For example:

   item_id  timestamp  target
      0 2019-01-01       0
      0 2019-01-02       1
      0 2019-01-03       2
      1 2019-01-01       3
      1 2019-01-02       4
      1 2019-01-03       5
      2 2019-01-01       6
      2 2019-01-02       7
      2 2019-01-03       8

Time-series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                        target
item_id timestamp
0       2019-01-01       0
        2019-01-02       1
        2019-01-03       2
1       2019-01-01       3
        2019-01-02       4
        2019-01-03       5
2       2019-01-01       6
        2019-01-02       7
        2019-01-03       8

Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted.

Time-series data in Iterable format. For example:

iterable_dataset = [
    {"target": [0, 1, 2], "start": pd.Timestamp("01-01-2019", freq='D')},
    {"target": [3, 4, 5], "start": pd.Timestamp("01-01-2019", freq='D')},
    {"target": [6, 7, 8], "start": pd.Timestamp("01-01-2019", freq='D')}
]

static_features (Optional[pd.DataFrame]) –
An optional data frame describing the metadata attributes of individual items in the item index. These may be categorical or real valued attributes for each item. For example, if the item index refers to time series data of individual households, static features may refer to time-independent demographic features. When provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time. When provided, the index of the static_features index must match the item index of the TimeSeriesDataFrame.

TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations although these features should be considered experimental.

freq#

A pandas and gluon-ts compatible string describing the frequency of the time series. For example “D” is daily data, etc. Also see, https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

Type: str

num_items#

Number of items (time series) in the data set.

Type: int

item_ids#

List of unique time series IDs contained in the data set.

Type: pd.Index

__init__(data: Any, static_features: Optional[DataFrame] = None, *args, **kwargs)[source]#

Methods

`copy`	Make a copy of this object's indices and data.
`dropna`	Drop rows containing NaNs.
`fill_missing_values`	Fill missing values represented by NaN.
`from_data_frame`	Construct a `TimeSeriesDataFrame` from a pandas DataFrame.
`from_iterable_dataset`	Construct a `TimeSeriesDataFrame` from an Iterable of dictionaries each of which represent a single time series.
`from_path`	Construct a `TimeSeriesDataFrame` from a CSV or Parquet file.
`from_pickle`	Convenience method to read pickled time series data frames.
`get_reindexed_view`	Returns a new TimeSeriesDataFrame object with the same underlying data and static features as the current data frame, except the time index is replaced by a new "dummy" time series index with the given frequency.
`num_timesteps_per_item`	Length of each time series in the dataframe.
`slice_by_time`	Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
`slice_by_timestep`	Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
`split_by_time`	Split dataframe to two different `TimeSeriesDataFrame` s before and after a certain `cutoff_time`.
`to_regular_index`	Fill the gaps in an irregularly-sampled time series with NaNs.

Attributes

`DUMMY_INDEX_START_TIME`
`freq`
`item_ids`
`num_items`
`static_features`