autogluon.timeseries.TimeSeriesDataFrame#

class autogluon.timeseries.TimeSeriesDataFrame(data: Any, static_features: Optional[DataFrame] = None, *args, **kwargs)[source]#

TimeSeriesDataFrame s represent a collection of time series, where each row identifies the values of an (item_id, timestamp) pair.

For example, a time series data frame could represent the daily sales of a collection of products, where each item_id identifies a product and timestamp s correspond to the days.

Parameters
  • data (Any) –

    Time-series data to construct a TimeSeriesDataFrame. The class currently supports four input formats.

    1. Time-series data in a pandas DataFrame format without multi-index. For example:

         item_id  timestamp  target
      0        0 2019-01-01       0
      1        0 2019-01-02       1
      2        0 2019-01-03       2
      3        1 2019-01-01       3
      4        1 2019-01-02       4
      5        1 2019-01-03       5
      6        2 2019-01-01       6
      7        2 2019-01-02       7
      8        2 2019-01-03       8
      
    2. Time-series data in pandas DataFrame format with multi-index on item_id and timestamp. For example:

                              target
      item_id timestamp
      0       2019-01-01       0
              2019-01-02       1
              2019-01-03       2
      1       2019-01-01       3
              2019-01-02       4
              2019-01-03       5
      2       2019-01-01       6
              2019-01-02       7
              2019-01-03       8
      
    3. Path to a data file in CSV or Parquet format. The file must contain columns item_id and timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted.

    4. Time-series data in Iterable format. For example:

      iterable_dataset = [
          {"target": [0, 1, 2], "start": pd.Timestamp("01-01-2019", freq='D')},
          {"target": [3, 4, 5], "start": pd.Timestamp("01-01-2019", freq='D')},
          {"target": [6, 7, 8], "start": pd.Timestamp("01-01-2019", freq='D')}
      ]
      

  • static_features (Optional[pd.DataFrame]) –

    An optional data frame describing the metadata attributes of individual items in the item index. These may be categorical or real valued attributes for each item. For example, if the item index refers to time series data of individual households, static features may refer to time-independent demographic features. When provided during fit, the TimeSeriesPredictor expects the same metadata to be available during prediction time. When provided, the index of the static_features index must match the item index of the TimeSeriesDataFrame.

    TimeSeriesDataFrame will ensure consistency of static features during serialization/deserialization, copy and slice operations although these features should be considered experimental.

freq#

A pandas and gluon-ts compatible string describing the frequency of the time series. For example ā€œDā€ is daily data, etc. Also see, https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

Type

str

num_items#

Number of items (time series) in the data set.

Type

int

item_ids#

List of unique time series IDs contained in the data set.

Type

pd.Index

__init__(data: Any, static_features: Optional[DataFrame] = None, *args, **kwargs)[source]#

Methods

copy

Make a copy of this object's indices and data.

dropna

Drop rows containing NaNs.

fill_missing_values

Fill missing values represented by NaN.

from_data_frame

Construct a TimeSeriesDataFrame from a pandas DataFrame.

from_iterable_dataset

Construct a TimeSeriesDataFrame from an Iterable of dictionaries each of which represent a single time series.

from_path

Construct a TimeSeriesDataFrame from a CSV or Parquet file.

from_pickle

Convenience method to read pickled time series data frames.

get_reindexed_view

Returns a new TimeSeriesDataFrame object with the same underlying data and static features as the current data frame, except the time index is replaced by a new "dummy" time series index with the given frequency.

num_timesteps_per_item

Length of each time series in the dataframe.

slice_by_time

Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.

slice_by_timestep

Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.

split_by_time

Split dataframe to two different TimeSeriesDataFrame s before and after a certain cutoff_time.

to_regular_index

Fill the gaps in an irregularly-sampled time series with NaNs.

Attributes

DUMMY_INDEX_START_TIME

freq

item_ids

num_items

static_features