.. _cloud_aws_sagemaker_deploy: Deploying AutoGluon Models with AWS SageMaker ============================================= After learning how to train a model using AWS SageMaker :ref:`cloud_aws_sagemaker_fit`, in this section we will learn how to deploy trained models using AWS SageMaker and Deep Learning Containers. The full end-to-end example is available in `amazon-sagemaker-examples `__ repository. Pre-requisites -------------- Before starting ensure that the latest version of sagemaker python API is installed via (``pip install --upgrade sagemaker``). This is required to ensure the information about newly released containers is available. Endpoint Deployment - Inference Script -------------------------------------- To start using the containers, an inference script and the `wrapper classes `__ are required. When authoring an inference `scripts `__, please refer to SageMaker `documentation `__. Here is one of the possible inference scripts. - the ``model_fn`` function is responsible for loading your model. It takes a ``model_dir`` argument that specifies where the model is stored. - the ``transform_fn`` function is responsible for deserializing your input data so that it can be passed to your model. It takes input data and content type as parameters, and returns deserialized data. The SageMaker inference toolkit provides a default implementation that deserializes the following content types: JSON, CSV, numpy array, NPZ. .. code:: python from autogluon.tabular import TabularPredictor import os import json from io import StringIO import pandas as pd import numpy as np def model_fn(model_dir): """loads model from previously saved artifact""" model = TabularPredictor.load(model_dir) globals()["column_names"] = model.feature_metadata_in.get_features() model.persist_models() return model def transform_fn(model, request_body, input_content_type, output_content_type="application/json"): if input_content_type == "text/csv": buf = StringIO(request_body) data = pd.read_csv(buf, header=None) num_cols = len(data.columns) if num_cols != len(column_names): raise Exception( f"Invalid data format. Input data has {num_cols} while the model expects {len(column_names)}" ) else: data.columns = column_names else: raise Exception(f"{input_content_type} content type not supported") pred = model.predict(data) pred_proba = model.predict_proba(data) prediction = pd.concat([pred, pred_proba], axis=1).values return json.dumps(prediction.tolist()), output_content_type Deployment as an inference endpoint ----------------------------------- To deploy AutoGluon model as a SageMaker inference endpoint, we configure SageMaker session first: .. code:: python import sagemaker # Helper wrappers referred earlier from ag_model import ( AutoGluonTraining, AutoGluonInferenceModel, AutoGluonTabularPredictor, ) from sagemaker import utils role = sagemaker.get_execution_role() sagemaker_session = sagemaker.session.Session() region = sagemaker_session._region_name bucket = sagemaker_session.default_bucket() s3_prefix = f"autogluon_sm/{utils.sagemaker_timestamp()}" output_path = f"s3://{bucket}/{s3_prefix}/output/" Upload the model archive trained earlier (if you trained AutoGluon model locally, it must be a zip archive of the model output directory): .. code:: python endpoint_name = sagemaker.utils.unique_name_from_base("sagemaker-autogluon-serving-trained-model") model_data = sagemaker_session.upload_data( path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models" ) Deploy the model: .. code:: python instance_type = "ml.m5.2xlarge" model = AutoGluonInferenceModel( model_data=model_data, role=role, region=region, framework_version="0.4", py_version="py38", instance_type=instance_type, source_dir="scripts", entry_point="tabular_serve.py", ) predictor = model.deploy( initial_instance_count=1, serializer=CSVSerializer(), instance_type=instance_type ) Once the predictor is deployed, it can be used for inference in the following way: .. code:: python predictions = predictor.predict(data) Using SageMaker batch transform for offline processing ------------------------------------------------------ Deploying a trained model to a hosted endpoint has been available in SageMaker since launch and is a great way to provide real-time predictions to a service like a website or mobile app. But, if the goal is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch transform functionality may be easier, more scalable, and more appropriate. `Read more about Batch Transform. `__ Upload the model archive trained earlier (if you trained AutoGluon model locally, it must be a zip archive of the model output directory): .. code:: python endpoint_name = sagemaker.utils.unique_name_from_base( "sagemaker-autogluon-batch_transform-trained-model" ) model_data = sagemaker_session.upload_data( path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models" ) Prepare transform job: .. code:: python instance_type = "ml.m5.2xlarge" model = AutoGluonInferenceModel( model_data=model_data, role=role, region=region, framework_version="0.4", py_version="py38", instance_type=instance_type, entry_point="tabular_serve-batch.py", source_dir="scripts", predictor_cls=AutoGluonTabularPredictor, ) transformer = model.transformer( instance_count=1, instance_type=instance_type, strategy="MultiRecord", max_payload=6, max_concurrent_transforms=1, output_path=output_path, accept="application/json", assemble_with="Line", ) The batch transform job accepts CSV file without header and index column - we need to remove them before sending to the transform job. .. code:: python output_file_name = "test_no_header.csv" pd.read_csv(f"data/test.csv")[:100].to_csv(f"data/{output_file_name}", header=False, index=False) test_input = transformer.sagemaker_session.upload_data( path=os.path.join("data", "test_no_header.csv"), key_prefix=s3_prefix ) The training script has some differences from the previous example, but follows the same APIs: .. code:: python from autogluon.tabular import TabularPredictor import os import json from io import StringIO import pandas as pd import numpy as np def model_fn(model_dir): """loads model from previously saved artifact""" model = TabularPredictor.load(model_dir) globals()["column_names"] = model.feature_metadata_in.get_features() return model def transform_fn(model, request_body, input_content_type, output_content_type="application/json"): if input_content_type == "text/csv": buf = StringIO(request_body) data = pd.read_csv(buf, header=None) num_cols = len(data.columns) if num_cols != len(column_names): raise Exception( f"Invalid data format. Input data has {num_cols} while the model expects {len(column_names)}" ) else: data.columns = column_names else: raise Exception(f"{input_content_type} content type not supported") pred = model.predict(data) pred_proba = model.predict_proba(data) prediction = pd.concat([pred, pred_proba], axis=1) return prediction.to_json(), output_content_type Run the transform job. When making predictions on a large dataset, you can exclude attributes that aren’t needed for prediction. After the predictions have been made, you can associate some of the excluded attributes with those predictions or with other input data in your report. By using batch transform to perform these data processing steps, you can often eliminate additional preprocessing or postprocessing. You can use input files in JSON and CSV format only. More details on how to use filters are available here: `Associate Prediction Results with Input Records `__. In this specific case we will use ``input_filter`` argument to get first 14 columns, thus removing target variable from the test set and ``output_filter`` to extract only the actual classes predictions without scores. .. code:: python transformer.transform( test_input, input_filter="$[:14]", # filter-out target variable split_type="Line", content_type="text/csv", output_filter="$['class']", # keep only prediction class in the output ) transformer.wait() output_s3_location = f"{transformer.output_path[:-1]}/{output_file_name}" The output file will be in ``output_s3_location`` variable. Conclusion ---------- In this tutorial we explored a few options how to deploy AutoGluon models using SageMaker. To explore more, refer to `SageMaker inference `__ documentation.