# Train and Deploy a Tabular Predictor on Amazon SageMaker

```{note}
This tutorial covers tabular classification and regression. For time series forecasting, see [Train a Time Series Predictor](./predictor-timeseries.md).
```

AutoGluon-Cloud lets you train, deploy, and run inference with AutoGluon tabular predictors on AWS using the same APIs you'd use locally. Under the hood, it runs your jobs on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) using AWS's official [AutoGluon deep learning containers](https://aws.github.io/deep-learning-containers/reference/available_images/#autogluon-training) — so you don't manage any infrastructure yourself.

## Training

```{important}
Before running any code below, follow the [Setup tutorial](setup.md) to register the IAM role and S3 bucket that SageMaker will use. The examples assume those resources are saved in `~/.autogluon/cloud.yaml`.
```

Create the predictor:

```python
from autogluon.cloud import TabularCloudPredictor

cloud_predictor = TabularCloudPredictor()
```

{py:meth}`TabularCloudPredictor.fit() <autogluon.cloud.TabularCloudPredictor.fit>` runs [`TabularPredictor.fit()`](https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.fit.html) inside a remote SageMaker job — along with `train_data`, the `predictor_init_args` and `predictor_fit_args` are forwarded straight through. Training, model artifacts, and AutoGluon itself all live on the remote instance, so you don't need AutoGluon installed locally.

```python
cloud_predictor.fit(
    train_data="train.csv",  # DataFrame, local path, or S3 URL (CSV/Parquet)
    predictor_init_args={"label": "label"},  # passed to TabularPredictor()
    predictor_fit_args={"time_limit": 120},  # passed to TabularPredictor.fit()
    instance_type="ml.m5.2xlarge",
)
```

`train_data` can be a pandas DataFrame, or a path to a local or S3 file (CSV or Parquet). In every case AutoGluon-Cloud loads the data locally and uploads it to your `cloud_output_path` bucket before kicking off the SageMaker job.

### Reattach to a training job
If your local connection drops, the training job keeps running on SageMaker. You can reattach with another `CloudPredictor` via {py:meth}`~autogluon.cloud.TabularCloudPredictor.attach_job` as long as you have the job name — it's logged when training starts (`INFO:sagemaker:Creating training-job with name: ag-cloudpredictor-...`) and also visible in the SageMaker console.

```python
another_cloud_predictor = TabularCloudPredictor()
another_cloud_predictor.attach_job(job_name="JOB_NAME")
```

A reattached job won't stream live logs — the full log becomes available once training finishes.

## Inference

Once a predictor is trained, you can get predictions in two ways:

- **Real-time inference**: deploy the predictor as a long-running SageMaker endpoint and send requests to it. Best when you need low-latency predictions on demand — e.g. behind a user-facing service.
- **Batch inference**: launch a one-off SageMaker job that scores a dataset and writes the results to S3. Best for offline scoring of larger datasets — compute spins up, runs, and shuts down automatically, so you only pay for what you use.

A rough guideline: if you need predictions less often than once an hour and can tolerate ~4 minutes of compute spin-up, batch inference is usually cheaper and easier to operate.

### Real-time inference

Deploy the predictor as a SageMaker endpoint with {py:meth}`~autogluon.cloud.TabularCloudPredictor.deploy`:

```python
cloud_predictor.deploy(
    instance_type="ml.m5.2xlarge",
)
```

Optionally, you can also attach to a deployed endpoint with {py:meth}`~autogluon.cloud.TabularCloudPredictor.attach_endpoint`:

```python
cloud_predictor.attach_endpoint(endpoint="ENDPOINT_NAME")
```

Send requests to the endpoint with {py:meth}`~autogluon.cloud.TabularCloudPredictor.predict_real_time`, which returns a pandas Series of predictions:

```python
result = cloud_predictor.predict_real_time("test.csv")  # DataFrame, local path, or S3 URL
# 0      dog
# 1      cat
# 2      cat
# Name: label, dtype: object
```

For class probabilities, use {py:meth}`~autogluon.cloud.TabularCloudPredictor.predict_proba_real_time`, which returns a DataFrame with one column per class:

```python
result = cloud_predictor.predict_proba_real_time("test.csv")
#         dog       cat
# 0  0.682754  0.317246
# 1  0.195782  0.804218
# 2  0.372283  0.627717
```

Make sure you clean up the endpoint with {py:meth}`~autogluon.cloud.TabularCloudPredictor.cleanup_deployment`:

```python
cloud_predictor.cleanup_deployment()
```

To check whether an endpoint is currently attached, call {py:meth}`~autogluon.cloud.TabularCloudPredictor.info` and look for the `endpoint` key in the returned dict.

#### Invoke the endpoint without AutoGluon-Cloud
The deployed endpoint is a normal SageMaker endpoint, and you can invoke it through other methods. For example, to invoke it with boto3 directly:

```python
import boto3

client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
    EndpointName=ENDPOINT_NAME,
    ContentType='text/csv',
    Accept='application/json',
    Body=test_data.to_csv()
)

#: Print the model endpoint's output.
print(response['Body'].read().decode())
```

### Batch inference

To score a dataset as a one-off job, use {py:meth}`~autogluon.cloud.TabularCloudPredictor.predict`. It returns a pandas Series of predictions:

```python
result = cloud_predictor.predict(
    "test.csv",  # DataFrame, local path, or S3 URL (CSV/Parquet)
    instance_type="ml.m5.2xlarge",
)
# 0      dog
# 1      cat
# 2      cat
# Name: label, dtype: object
```

For class probabilities, use {py:meth}`~autogluon.cloud.TabularCloudPredictor.predict_proba`. With `include_predict=True` (the default) it returns a `(predictions, probabilities)` tuple — useful because it avoids the cost of a second batch job. Pass `include_predict=False` to get the probabilities DataFrame alone:

```python
predictions, probabilities = cloud_predictor.predict_proba(
    "test.csv",
    include_predict=True,
    instance_type="ml.m5.2xlarge",
)
# predictions:
# 0      dog
# 1      cat
# 2      cat
# Name: label, dtype: object
#
# probabilities:
#         dog       cat
# 0  0.682754  0.317246
# 1  0.195782  0.804218
# 2  0.372283  0.627717
```

## Inspect predictor state

To retrieve general info about a `CloudPredictor`, call {py:meth}`~autogluon.cloud.TabularCloudPredictor.info`:

```python
cloud_predictor.info()
```

It will output a dict similar to this:

```python
{
    'local_output_path': '/home/ubuntu/XXX/demo/AutogluonCloudPredictor/ag-20221111_174928',
    'cloud_output_path': 's3://XXX/tabular-demo',
    'fit_job': {
        'name': 'ag-cloudpredictor-1668188968-e5c3',
        'status': 'Completed',
        'framework_version': '0.6.1',
        'artifact_path': 's3://XXX/tabular-demo/model/ag-cloudpredictor-1668188968-e5c3/output/model.tar.gz'
    },
    'recent_transform_job': {
        'name': 'ag-cloudpredictor-1668189393-e95c',
        'status': 'Completed',
        'result_path': 's3://XXX/tabular-demo/batch_transform/2022-11-11-17-56-33-991/results/test.csv.out'
    },
    'transform_jobs': ['ag-cloudpredictor-1668189393-e95c'],
    'endpoint': 'ag-cloudpredictor-1668189208-d23b'
}
```

## Download the trained predictor
You can convert the `CloudPredictor` trained on SageMaker into a local AutoGluon predictor with {py:meth}`~autogluon.cloud.TabularCloudPredictor.to_local_predictor`, as long as you have the same version of AutoGluon installed locally.

```python
local_predictor = cloud_predictor.to_local_predictor(
    save_path="PATH"  # If not specified, CloudPredictor will create one.
)  # local_predictor would be a TabularPredictor
```

`to_local_predictor()` downloads the trained model tarball, expands it to your local disk, and loads it as the corresponding AutoGluon predictor.