autogluon.timeseries.TimeSeriesDataFrame¶
- class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
A collection of univariate time series, where each row is identified by an (
item_id
,timestamp
) pair.For example, a time series data frame could represent the daily sales of a collection of products, where each
item_id
corresponds to a product andtimestamp
corresponds to the day of the record.- Parameters:
data (pd.DataFrame, str, pathlib.Path or Iterable) –
Time series data to construct a
TimeSeriesDataFrame
. The class currently supports four input formats.Time series data in a pandas DataFrame format without multi-index. For example:
item_id timestamp target 0 0 2019-01-01 0 1 0 2019-01-02 1 2 0 2019-01-03 2 3 1 2019-01-01 3 4 1 2019-01-02 4 5 1 2019-01-03 5 6 2 2019-01-01 6 7 2 2019-01-02 7 8 2 2019-01-03 8
You can also use
from_data_frame()
for loading data in such format.Path to a data file in CSV or Parquet format. The file must contain columns
item_id
andtimestamp
, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also usefrom_path()
for loading data in such format.Time series data in pandas DataFrame format with multi-index on
item_id
andtimestamp
. For example:target item_id timestamp 0 2019-01-01 0 2019-01-02 1 2019-01-03 2 1 2019-01-01 3 2019-01-02 4 2019-01-03 5 2 2019-01-01 6 2019-01-02 7 2019-01-03 8
Time series data in Iterable format. For example:
iterable_dataset = [ {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')}, {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')}, {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')} ]
You can also use
from_iterable_dataset()
for loading data in such format.static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if
TimeSeriesDataFrame
contains sales of various products, static features may refer to time-independent features like color or brand.The index of the
static_features
index must contain a single entry for each item present in the respectiveTimeSeriesDataFrame
. For example, the followingTimeSeriesDataFrame
:target item_id timestamp A 2019-01-01 0 2019-01-02 1 2019-01-03 2 B 2019-01-01 3 2019-01-02 4 2019-01-03 5
is compatible with the following
static_features
:feat_1 feat_2 item_id A 2.0 bar B 5.0 foo
TimeSeriesDataFrame
will ensure consistency of static features during serialization/deserialization, copy and slice operations.If
static_features
are provided duringfit
, theTimeSeriesPredictor
expects the same metadata to be available during prediction time.id_column (str, optional) – Name of the
item_id
column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).timestamp_column (str, optional) – Name of the
timestamp
column, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).
- __init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
Methods
Assign new columns to the time series dataframe.
Convert each time series in the data frame to the given frequency.
Make a copy of the TimeSeriesDataFrame.
Drop rows containing NaNs.
Fill missing values represented by NaN.
Construct a
TimeSeriesDataFrame
from a pandas DataFrame.Construct a
TimeSeriesDataFrame
from an Iterable of dictionaries each of which represent a single time series.Construct a
TimeSeriesDataFrame
from a CSV or Parquet file.Convenience method to read pickled time series data frames.
Prepare model inputs necessary to predict the last
prediction_length
time steps of each time series in the dataset.Infer the time series frequency based on the timestamps of the observations.
Length of each time series in the dataframe.
Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
Sort object by labels (along an axis).
Split dataframe to two different
TimeSeriesDataFrame
s before and after a certaincutoff_time
.Convert TimeSeriesDataFrame to a pandas.DataFrame
Generate a train/test split from the given dataset.
Attributes
freq
Inferred pandas-compatible frequency of the timestamps in the data frame.
item_ids
List of unique time series IDs contained in the data set.
num_items
Number of items (time series) in the data set.
static_features