autogluon.timeseries.TimeSeriesDataFrame¶
- class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
A collection of univariate time series, where each row is identified by an (
item_id
,timestamp
) pair.For example, a time series data frame could represent the daily sales of a collection of products, where each
item_id
corresponds to a product andtimestamp
corresponds to the day of the record.- Parameters:
data (pd.DataFrame, str, pathlib.Path or Iterable) –
Time series data to construct a
TimeSeriesDataFrame
. The class currently supports four input formats.Time series data in a pandas DataFrame format without multi-index. For example:
item_id timestamp target 0 0 2019-01-01 0 1 0 2019-01-02 1 2 0 2019-01-03 2 3 1 2019-01-01 3 4 1 2019-01-02 4 5 1 2019-01-03 5 6 2 2019-01-01 6 7 2 2019-01-02 7 8 2 2019-01-03 8
You can also use
from_data_frame()
for loading data in such format.Path to a data file in CSV or Parquet format. The file must contain columns
item_id
andtimestamp
, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also usefrom_path()
for loading data in such format.Time series data in pandas DataFrame format with multi-index on
item_id
andtimestamp
. For example:target item_id timestamp 0 2019-01-01 0 2019-01-02 1 2019-01-03 2 1 2019-01-01 3 2019-01-02 4 2019-01-03 5 2 2019-01-01 6 2019-01-02 7 2019-01-03 8
Time series data in Iterable format. For example:
iterable_dataset = [ {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')}, {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')}, {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')} ]
You can also use
from_iterable_dataset()
for loading data in such format.static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if
TimeSeriesDataFrame
contains sales of various products, static features may refer to time-independent features like color or brand.The index of the
static_features
index must contain a single entry for each item present in the respectiveTimeSeriesDataFrame
. For example, the followingTimeSeriesDataFrame
:target item_id timestamp A 2019-01-01 0 2019-01-02 1 2019-01-03 2 B 2019-01-01 3 2019-01-02 4 2019-01-03 5
is compatible with the following
static_features
:feat_1 feat_2 item_id A 2.0 bar B 5.0 foo
TimeSeriesDataFrame
will ensure consistency of static features during serialization/deserialization, copy and slice operations.If
static_features
are provided duringfit
, theTimeSeriesPredictor
expects the same metadata to be available during prediction time.id_column (str, optional) – Name of the
item_id
column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).timestamp_column (str, optional) – Name of the
timestamp
column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).
- freq¶
A pandas-compatible string describing the frequency of the time series. For example
"D"
for daily data,"h"
for hourly data, etc. This attribute is determined automatically based on the timestamps. For the full list of possible values, see pandas documentation.- Type:
str
- num_items¶
Number of items (time series) in the data set.
- Type:
int
- item_ids¶
List of unique time series IDs contained in the data set.
- Type:
pd.Index
- __init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
Methods
Convert each time series in the data frame to the given frequency.
Make a copy of the TimeSeriesDataFrame.
Drop rows containing NaNs.
Fill missing values represented by NaN.
Construct a
TimeSeriesDataFrame
from a pandas DataFrame.Construct a
TimeSeriesDataFrame
from an Iterable of dictionaries each of which represent a single time series.Construct a
TimeSeriesDataFrame
from a CSV or Parquet file.Convenience method to read pickled time series data frames.
Prepare model inputs necessary to predict the last
prediction_length
time steps of each time series in the dataset.Infer the time series frequency based on the timestamps of the observations.
Length of each time series in the dataframe.
Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
Split dataframe to two different
TimeSeriesDataFrame
s before and after a certaincutoff_time
.Convert TimeSeriesDataFrame to a pandas.DataFrame
Generate a train/test split from the given dataset.
Attributes