TimeSeriesDataFrame.convert_frequency

TimeSeriesDataFrame.convert_frequency(freq: str | DateOffset, agg_numeric: str = 'mean', agg_categorical: str = 'first', num_cpus: int = -1, chunk_size: int = 100, **kwargs) TimeSeriesDataFrame[source]

Convert each time series in the dataframe to the given frequency.

This method is useful for two purposes:

  1. Converting an irregularly-sampled time series to a regular time index.

  2. Aggregating time series data by downsampling (e.g., convert daily sales into weekly sales)

Standard df.groupby(...).resample(...) can be extremely slow for large datasets, so we parallelize this operation across multiple CPU cores.

Parameters:
  • freq (Union[str, pd.DateOffset]) – Frequency to which the data should be converted. See pandas frequency aliases for supported values.

  • agg_numeric ({"max", "min", "sum", "mean", "median", "first", "last"}, default = "mean") – Aggregation method applied to numeric columns.

  • agg_categorical ({"first", "last"}, default = "first") – Aggregation method applied to categorical columns.

  • num_cpus (int, default = -1) – Number of CPU cores used when resampling in parallel. Set to -1 to use all cores.

  • chunk_size (int, default = 100) – Number of time series in a chunk assigned to each parallel worker.

  • **kwargs – Additional keywords arguments that will be passed to pandas.DataFrameGroupBy.resample.

Returns:

ts_df – A new time series dataframe with time series resampled at the new frequency. Output may contain missing values represented by NaN if original data does not have information for the given period.

Return type:

TimeSeriesDataFrame

Examples

Convert irregularly-sampled time series data to a regular index

>>> ts_df
                    target
item_id timestamp
0       2019-01-01     NaN
        2019-01-03     1.0
        2019-01-06     2.0
        2019-01-07     NaN
1       2019-02-04     3.0
        2019-02-07     4.0
>>> ts_df.convert_frequency(freq="D")
                    target
item_id timestamp
0       2019-01-01     NaN
        2019-01-02     NaN
        2019-01-03     1.0
        2019-01-04     NaN
        2019-01-05     NaN
        2019-01-06     2.0
        2019-01-07     NaN
1       2019-02-04     3.0
        2019-02-05     NaN
        2019-02-06     NaN
        2019-02-07     4.0

Downsample quarterly data to yearly frequency

>>> ts_df
                    target
item_id timestamp
0       2020-03-31     1.0
        2020-06-30     2.0
        2020-09-30     3.0
        2020-12-31     4.0
        2021-03-31     5.0
        2021-06-30     6.0
        2021-09-30     7.0
        2021-12-31     8.0
>>> ts_df.convert_frequency("YE")
                    target
item_id timestamp
0       2020-12-31     2.5
        2021-12-31     6.5
>>> ts_df.convert_frequency("YE", agg_numeric="sum")
                    target
item_id timestamp
0       2020-12-31    10.0
        2021-12-31    26.0