fit_predict¶

TimeSeriesCloudPredictor.fit_predict(train_data: str | Path | DataFrame, *, predictor_init_args: Dict[str, Any], predictor_fit_args: Dict[str, Any] | None = None, known_covariates: str | Path | DataFrame | None = None, static_features: str | Path | DataFrame | None = None, id_column: str = 'item_id', timestamp_column: str = 'timestamp', framework_version: str = 'latest', job_name: str | None = None, instance_type: str = 'ml.m5.2xlarge', instance_count: int = 1, volume_size: int = 100, custom_image_uri: str | None = None, wait: bool = True, predictions_path: str | None = None, backend_kwargs: Dict | None = None) → DataFrame | None[source]¶

Fit and predict in a single SageMaker training job.

This is useful for foundation-model forecasting workflows (e.g. Chronos-2) where “fit” is essentially loading a pretrained model. Running fit and predict in the same job avoids the SageMaker startup overhead twice.

Predictions are generated inside the training container against train_data (the standard time-series forecasting flow where the last prediction_length steps of each series are forecast) and written directly to S3.

Parameters:

train_data (Union[str, pathlib.Path, pd.DataFrame]) – Historical time series to train on and forecast from, in long format, as a DataFrame or local/S3 path to a data file.
predictor_init_args (dict) – Arguments forwarded to TimeSeriesPredictor(). Must include prediction_length. See the TimeSeriesPredictor docs for available options.
predictor_fit_args (Optional[dict], default = None) – Additional fit args forwarded to TimeSeriesPredictor.fit(). See the TimeSeriesPredictor.fit docs for available options. Must NOT contain train_data, tuning_data, or known_covariates — pass those as explicit arguments above.
known_covariates (Optional[Union[str, pathlib.Path, pd.DataFrame]], default = None) – Future values of the known covariates over the forecast horizon. Must be provided if known_covariates_names was specified in predictor_init_args.
static_features (Optional[Union[str, pathlib.Path, pd.DataFrame]], default = None) – Static (time-independent) features describing each individual time series.
id_column (str, default = "item_id") – Name of the column with the unique identifier of each time series (item).
timestamp_column (str, default = "timestamp") – Name of the column with the observation timestamps.
framework_version (str, default = latest) – Training container version of autogluon. If latest, will use the latest available container version. If custom_image_uri is set, this argument will be ignored.
job_name (str, default = None) – Name of the launched training job. If None, CloudPredictor will create one with prefix ag-cloudpredictor.
instance_type (str, default = 'ml.m5.2xlarge') – Instance type the predictor will be trained on with SageMaker.
instance_count (int, default = 1) – Number of instances used to fit the predictor.
volume_size (int, default = 100) – Size in GB of the EBS volume to use for storing input data during training.
custom_image_uri (Optional[str], default = None) – Custom container image URI. If set, framework_version is ignored.
wait (bool, default = True) – Whether the call should wait until the job completes.
backend_kwargs (Optional[dict], default = None) – Backend-specific arguments. Same keys as fit().
predictions_path (Optional[str]) – S3 URL where predictions will be written by the training container (e.g. s3://my-bucket/runs/2024-05-01/predictions.csv). The container’s SageMaker execution role must have s3:PutObject permission for this location. Defaults to {cloud_output_path}/{job_name}/predictions.csv. Predictions use AutoGluon’s canonical column names item_id and timestamp, regardless of the id_column / timestamp_column passed in.

Returns:

Predictions as a DataFrame. Returns None when wait is False.

Return type:

Optional[pd.DataFrame]