TimeSeriesDataFrame.convert_frequency¶
- TimeSeriesDataFrame.convert_frequency(freq: str | DateOffset, agg_numeric: str = 'mean', agg_categorical: str = 'first', num_cpus: int = -1, chunk_size: int = 100, **kwargs) TimeSeriesDataFrame [source]¶
Convert each time series in the data frame to the given frequency.
This method is useful for two purposes:
Converting an irregularly-sampled time series to a regular time index.
Aggregating time series data by downsampling (e.g., convert daily sales into weekly sales)
Standard
df.groupby(...).resample(...)
can be extremely slow for large datasets, so we parallelize this operation across multiple CPU cores.- Parameters:
freq (Union[str, pd.DateOffset]) – Frequency to which the data should be converted. See pandas frequency aliases for supported values.
agg_numeric ({"max", "min", "sum", "mean", "median", "first", "last"}, default = "mean") – Aggregation method applied to numeric columns.
agg_categorical ({"first", "last"}, default = "first") – Aggregation method applied to categorical columns.
num_cpus (int, default = -1) – Number of CPU cores used when resampling in parallel. Set to -1 to use all cores.
chunk_size (int, default = 100) – Number of time series in a chunk assigned to each parallel worker.
**kwargs – Additional keywords arguments that will be passed to
pandas.DataFrameGroupBy.resample
.
- Returns:
ts_df – A new time series dataframe with time series resampled at the new frequency. Output may contain missing values represented by
NaN
if original data does not have information for the given period.- Return type:
Examples
Convert irregularly-sampled time series data to a regular index
>>> ts_df target item_id timestamp 0 2019-01-01 NaN 2019-01-03 1.0 2019-01-06 2.0 2019-01-07 NaN 1 2019-02-04 3.0 2019-02-07 4.0 >>> ts_df.convert_frequency(freq="D") target item_id timestamp 0 2019-01-01 NaN 2019-01-02 NaN 2019-01-03 1.0 2019-01-04 NaN 2019-01-05 NaN 2019-01-06 2.0 2019-01-07 NaN 1 2019-02-04 3.0 2019-02-05 NaN 2019-02-06 NaN 2019-02-07 4.0
Downsample quarterly data to yearly frequency
>>> ts_df target item_id timestamp 0 2020-03-31 1.0 2020-06-30 2.0 2020-09-30 3.0 2020-12-31 4.0 2021-03-31 5.0 2021-06-30 6.0 2021-09-30 7.0 2021-12-31 8.0 >>> ts_df.convert_frequency("YE") target item_id timestamp 0 2020-12-31 2.5 2021-12-31 6.5 >>> ts_df.convert_frequency("YE", agg_numeric="sum") target item_id timestamp 0 2020-12-31 10.0 2021-12-31 26.0