.. _sec_tabulardeployment: Predicting Columns in a Table - Deployment Optimization ======================================================= This tutorial will cover how to perform the end-to-end AutoML process to create an optimized and deployable AutoGluon artifact for production usage. This tutorial assumes you have already read :ref:`sec_tabularquick` and :ref:`sec_tabularadvanced`. Fitting a TabularPredictor -------------------------- We will again use the AdultIncome dataset as in the previous tutorials and train a predictor to predict whether the person’s income exceeds $50,000 or not, which is recorded in the ``class`` column of this table. .. code:: python from autogluon.tabular import TabularDataset, TabularPredictor train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') label = 'class' subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values train_data = train_data.sample(n=subsample_size, random_state=0) train_data.head() .. raw:: html
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
6118 51 Private 39264 Some-college 10 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States >50K
23204 58 Private 51662 10th 6 Married-civ-spouse Other-service Wife White Female 0 0 8 United-States <=50K
29590 40 Private 326310 Some-college 10 Married-civ-spouse Craft-repair Husband White Male 0 0 44 United-States <=50K
18116 37 Private 222450 HS-grad 9 Never-married Sales Not-in-family White Male 0 2339 40 El-Salvador <=50K
33964 62 Private 109190 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 15024 0 40 United-States >50K
.. code:: python save_path = 'agModels-predictClass-deployment' # specifies folder to store trained models predictor = TabularPredictor(label=label, path=save_path).fit(train_data) .. parsed-literal:: :class: output Beginning AutoGluon training ... AutoGluon will save models to "agModels-predictClass-deployment/" AutoGluon Version: 0.6.1b20221213 Python Version: 3.8.10 Operating System: Linux Platform Machine: x86_64 Platform Version: #1 SMP Tue Nov 30 00:17:50 UTC 2021 Train Data Rows: 500 Train Data Columns: 14 Label Column: class Preprocessing data ... AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed). 2 unique label values: [' >50K', ' <=50K'] If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class. To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init. Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 31599.62 MB Train Data (Original) Memory Usage: 0.29 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 1 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('int', ['bool']) : 1 | ['sex'] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.09s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100 Fitting 13 L1 models ... Fitting model: KNeighborsUnif ... 0.73 = Validation score (accuracy) 0.61s = Training runtime 0.01s = Validation runtime Fitting model: KNeighborsDist ... 0.65 = Validation score (accuracy) 0.6s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMXT ... 0.83 = Validation score (accuracy) 1.25s = Training runtime 0.01s = Validation runtime Fitting model: LightGBM ... 0.85 = Validation score (accuracy) 0.82s = Training runtime 0.01s = Validation runtime Fitting model: RandomForestGini ... 0.84 = Validation score (accuracy) 1.08s = Training runtime 0.06s = Validation runtime Fitting model: RandomForestEntr ... 0.83 = Validation score (accuracy) 1.06s = Training runtime 0.06s = Validation runtime Fitting model: CatBoost ... 0.85 = Validation score (accuracy) 1.4s = Training runtime 0.01s = Validation runtime Fitting model: ExtraTreesGini ... 0.82 = Validation score (accuracy) 1.07s = Training runtime 0.06s = Validation runtime Fitting model: ExtraTreesEntr ... 0.81 = Validation score (accuracy) 1.06s = Training runtime 0.06s = Validation runtime Fitting model: NeuralNetFastAI ... 0.82 = Validation score (accuracy) 2.61s = Training runtime 0.01s = Validation runtime Fitting model: XGBoost ... 0.87 = Validation score (accuracy) 0.26s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetTorch ... 0.83 = Validation score (accuracy) 1.02s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMLarge ... 0.83 = Validation score (accuracy) 0.54s = Training runtime 0.01s = Validation runtime Fitting model: WeightedEnsemble_L2 ... 0.87 = Validation score (accuracy) 0.32s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 14.27s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictClass-deployment/") Next, load separate test data to demonstrate how to make predictions on new examples at inference time: .. code:: python test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv') y_test = test_data[label] # values to predict test_data.head() .. parsed-literal:: :class: output Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769 .. raw:: html
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
0 31 Private 169085 11th 7 Married-civ-spouse Sales Wife White Female 0 0 20 United-States <=50K
1 17 Self-emp-not-inc 226203 12th 8 Never-married Sales Own-child White Male 0 0 45 United-States <=50K
2 47 Private 54260 Assoc-voc 11 Married-civ-spouse Exec-managerial Husband White Male 0 1887 60 United-States >50K
3 21 Private 176262 Some-college 10 Never-married Exec-managerial Own-child White Female 0 0 30 United-States <=50K
4 17 Private 241185 12th 8 Never-married Prof-specialty Own-child White Male 0 0 20 United-States <=50K
We use our trained models to make predictions on the new data: .. code:: python predictor = TabularPredictor.load(save_path) # unnecessary, just demonstrates how to load previously-trained predictor from file y_pred = predictor.predict(test_data) y_pred .. parsed-literal:: :class: output 0 <=50K 1 <=50K 2 <=50K 3 <=50K 4 <=50K ... 9764 <=50K 9765 <=50K 9766 <=50K 9767 <=50K 9768 <=50K Name: class, Length: 9769, dtype: object We can use leaderboard to evaluate the performance of each individual trained model on our labeled test data: .. code:: python predictor.leaderboard(test_data, silent=True) .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 RandomForestGini 0.842870 0.84 0.137671 0.055894 1.080855 0.137671 0.055894 1.080855 1 True 5
1 CatBoost 0.842461 0.85 0.012603 0.005573 1.403570 0.012603 0.005573 1.403570 1 True 7
2 RandomForestEntr 0.841130 0.83 0.140647 0.060857 1.060027 0.140647 0.060857 1.060027 1 True 6
3 LightGBM 0.839799 0.85 0.014990 0.008039 0.824368 0.014990 0.008039 0.824368 1 True 4
4 XGBoost 0.837445 0.87 0.050143 0.007187 0.261149 0.050143 0.007187 0.261149 1 True 11
5 WeightedEnsemble_L2 0.837445 0.87 0.052607 0.007834 0.583509 0.002464 0.000648 0.322360 2 True 14
6 LightGBMXT 0.836421 0.83 0.010455 0.005912 1.248788 0.010455 0.005912 1.248788 1 True 3
7 ExtraTreesGini 0.834579 0.82 0.139147 0.060351 1.065567 0.139147 0.060351 1.065567 1 True 8
8 NeuralNetTorch 0.833555 0.83 0.056062 0.013697 1.024997 0.056062 0.013697 1.024997 1 True 12
9 ExtraTreesEntr 0.833350 0.81 0.140015 0.058261 1.058253 0.140015 0.058261 1.058253 1 True 9
10 LightGBMLarge 0.828949 0.83 0.036233 0.005726 0.544085 0.036233 0.005726 0.544085 1 True 13
11 NeuralNetFastAI 0.818610 0.82 0.152624 0.013950 2.614331 0.152624 0.013950 2.614331 1 True 10
12 KNeighborsUnif 0.725970 0.73 0.027956 0.008520 0.609989 0.027956 0.008520 0.609989 1 True 1
13 KNeighborsDist 0.695158 0.65 0.025601 0.006325 0.603475 0.025601 0.006325 0.603475 1 True 2
Snapshot a Predictor with .clone() ---------------------------------- Now that we have a working predictor artifact, we may want to alter it in a variety of ways to better suite our needs. For example, we may want to delete certain models to reduce disk usage via ``.delete_models()``, or train additional models on top of the ones we already have via ``.fit_extra()``. While you can do all of these operations on your predictor, you may want to be able to be able to revert to a prior state of the predictor in case something goes wrong. This is where ``predictor.clone()`` comes in. ``predictor.clone()`` allows you to create a snapshot of the given predictor, cloning the artifacts of the predictor to a new location. You can then freely play around with the predictor and always load the earlier snapshot in case you want to undo your actions. All you need to do to clone a predictor is specify a new directory path to clone to: .. code:: python save_path_clone = save_path + '-clone' # will return the path to the cloned predictor, identical to save_path_clone path_clone = predictor.clone(path=save_path_clone) .. parsed-literal:: :class: output Cloned TabularPredictor located in 'agModels-predictClass-deployment/' to 'agModels-predictClass-deployment-clone'. To load the cloned predictor: predictor_clone = TabularPredictor.load(path="agModels-predictClass-deployment-clone") Note that this logic doubles disk usage, as it completely clones every predictor artifact on disk to make an exact replica. Now we can load the cloned predictor: .. code:: python predictor_clone = TabularPredictor.load(path=path_clone) # You can alternatively load the cloned TabularPredictor at the time of cloning: # predictor_clone = predictor.clone(path=save_path_clone, return_clone=True) We can see that the cloned predictor has the same leaderboard and functionality as the original: .. code:: python y_pred_clone = predictor.predict(test_data) y_pred_clone .. parsed-literal:: :class: output 0 <=50K 1 <=50K 2 <=50K 3 <=50K 4 <=50K ... 9764 <=50K 9765 <=50K 9766 <=50K 9767 <=50K 9768 <=50K Name: class, Length: 9769, dtype: object .. code:: python y_pred.equals(y_pred_clone) .. parsed-literal:: :class: output True .. code:: python predictor_clone.leaderboard(test_data, silent=True) .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 RandomForestGini 0.842870 0.84 0.135722 0.055894 1.080855 0.135722 0.055894 1.080855 1 True 5
1 CatBoost 0.842461 0.85 0.011522 0.005573 1.403570 0.011522 0.005573 1.403570 1 True 7
2 RandomForestEntr 0.841130 0.83 0.134074 0.060857 1.060027 0.134074 0.060857 1.060027 1 True 6
3 LightGBM 0.839799 0.85 0.015382 0.008039 0.824368 0.015382 0.008039 0.824368 1 True 4
4 XGBoost 0.837445 0.87 0.046215 0.007187 0.261149 0.046215 0.007187 0.261149 1 True 11
5 WeightedEnsemble_L2 0.837445 0.87 0.048518 0.007834 0.583509 0.002304 0.000648 0.322360 2 True 14
6 LightGBMXT 0.836421 0.83 0.010329 0.005912 1.248788 0.010329 0.005912 1.248788 1 True 3
7 ExtraTreesGini 0.834579 0.82 0.135083 0.060351 1.065567 0.135083 0.060351 1.065567 1 True 8
8 NeuralNetTorch 0.833555 0.83 0.053613 0.013697 1.024997 0.053613 0.013697 1.024997 1 True 12
9 ExtraTreesEntr 0.833350 0.81 0.139537 0.058261 1.058253 0.139537 0.058261 1.058253 1 True 9
10 LightGBMLarge 0.828949 0.83 0.034355 0.005726 0.544085 0.034355 0.005726 0.544085 1 True 13
11 NeuralNetFastAI 0.818610 0.82 0.143432 0.013950 2.614331 0.143432 0.013950 2.614331 1 True 10
12 KNeighborsUnif 0.725970 0.73 0.026624 0.008520 0.609989 0.026624 0.008520 0.609989 1 True 1
13 KNeighborsDist 0.695158 0.65 0.025924 0.006325 0.603475 0.025924 0.006325 0.603475 1 True 2
Now let’s do some extra logic with the clone, such as calling refit_full: .. code:: python predictor_clone.refit_full() predictor_clone.leaderboard(test_data, silent=True) .. parsed-literal:: :class: output Fitting 1 L1 models ... Fitting model: KNeighborsUnif_FULL ... 0.01s = Training runtime Fitting 1 L1 models ... Fitting model: KNeighborsDist_FULL ... 0.01s = Training runtime Fitting 1 L1 models ... Fitting model: LightGBMXT_FULL ... 0.14s = Training runtime Fitting 1 L1 models ... Fitting model: LightGBM_FULL ... 0.16s = Training runtime Fitting 1 L1 models ... Fitting model: RandomForestGini_FULL ... 0.48s = Training runtime Fitting 1 L1 models ... Fitting model: RandomForestEntr_FULL ... 0.47s = Training runtime Fitting 1 L1 models ... Fitting model: CatBoost_FULL ... 0.03s = Training runtime Fitting 1 L1 models ... Fitting model: ExtraTreesGini_FULL ... 0.47s = Training runtime Fitting 1 L1 models ... Fitting model: ExtraTreesEntr_FULL ... 0.47s = Training runtime Fitting 1 L1 models ... Fitting model: NeuralNetFastAI_FULL ... No improvement since epoch 0: early stopping 0.39s = Training runtime Fitting 1 L1 models ... Fitting model: XGBoost_FULL ... 0.07s = Training runtime Fitting 1 L1 models ... Fitting model: NeuralNetTorch_FULL ... 0.56s = Training runtime Fitting 1 L1 models ... Fitting model: LightGBMLarge_FULL ... 0.22s = Training runtime Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ... 0.32s = Training runtime Updated best model to "WeightedEnsemble_L2_FULL" (Previously "WeightedEnsemble_L2"). AutoGluon will default to using "WeightedEnsemble_L2_FULL" for predict() and predict_proba(). .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 CatBoost_FULL 0.842870 NaN 0.011228 NaN 0.026799 0.011228 NaN 0.026799 1 True 21
1 RandomForestGini 0.842870 0.84 0.138639 0.055894 1.080855 0.138639 0.055894 1.080855 1 True 5
2 CatBoost 0.842461 0.85 0.012043 0.005573 1.403570 0.012043 0.005573 1.403570 1 True 7
3 RandomForestEntr 0.841130 0.83 0.138774 0.060857 1.060027 0.138774 0.060857 1.060027 1 True 6
4 LightGBM_FULL 0.840823 NaN 0.017195 NaN 0.163478 0.017195 NaN 0.163478 1 True 18
5 LightGBM 0.839799 0.85 0.015824 0.008039 0.824368 0.015824 0.008039 0.824368 1 True 4
6 RandomForestGini_FULL 0.839595 NaN 0.140190 NaN 0.478390 0.140190 NaN 0.478390 1 True 19
7 RandomForestEntr_FULL 0.839185 NaN 0.138538 NaN 0.474687 0.138538 NaN 0.474687 1 True 20
8 LightGBMXT_FULL 0.837957 NaN 0.011016 NaN 0.137915 0.011016 NaN 0.137915 1 True 17
9 XGBoost 0.837445 0.87 0.048745 0.007187 0.261149 0.048745 0.007187 0.261149 1 True 11
10 WeightedEnsemble_L2 0.837445 0.87 0.051331 0.007834 0.583509 0.002586 0.000648 0.322360 2 True 14
11 LightGBMXT 0.836421 0.83 0.010284 0.005912 1.248788 0.010284 0.005912 1.248788 1 True 3
12 ExtraTreesEntr_FULL 0.835910 NaN 0.143991 NaN 0.473450 0.143991 NaN 0.473450 1 True 23
13 NeuralNetTorch_FULL 0.835091 NaN 0.058724 NaN 0.559810 0.058724 NaN 0.559810 1 True 26
14 ExtraTreesGini 0.834579 0.82 0.142700 0.060351 1.065567 0.142700 0.060351 1.065567 1 True 8
15 ExtraTreesGini_FULL 0.833862 NaN 0.141119 NaN 0.472204 0.141119 NaN 0.472204 1 True 22
16 NeuralNetTorch 0.833555 0.83 0.057129 0.013697 1.024997 0.057129 0.013697 1.024997 1 True 12
17 ExtraTreesEntr 0.833350 0.81 0.140146 0.058261 1.058253 0.140146 0.058261 1.058253 1 True 9
18 XGBoost_FULL 0.831610 NaN 0.044393 NaN 0.069248 0.044393 NaN 0.069248 1 True 25
19 WeightedEnsemble_L2_FULL 0.831610 NaN 0.047146 NaN 0.391608 0.002753 NaN 0.322360 2 True 28
20 LightGBMLarge 0.828949 0.83 0.038662 0.005726 0.544085 0.038662 0.005726 0.544085 1 True 13
21 LightGBMLarge_FULL 0.820964 NaN 0.041921 NaN 0.220074 0.041921 NaN 0.220074 1 True 27
22 NeuralNetFastAI 0.818610 0.82 0.155864 0.013950 2.614331 0.155864 0.013950 2.614331 1 True 10
23 NeuralNetFastAI_FULL 0.769270 NaN 0.151720 NaN 0.386512 0.151720 NaN 0.386512 1 True 24
24 KNeighborsUnif 0.725970 0.73 0.025264 0.008520 0.609989 0.025264 0.008520 0.609989 1 True 1
25 KNeighborsUnif_FULL 0.725151 NaN 0.023710 NaN 0.005904 0.023710 NaN 0.005904 1 True 15
26 KNeighborsDist 0.695158 0.65 0.027080 0.006325 0.603475 0.027080 0.006325 0.603475 1 True 2
27 KNeighborsDist_FULL 0.685434 NaN 0.025221 NaN 0.005437 0.025221 NaN 0.005437 1 True 16
We can see that we were able to fit additional models, but for whatever reason we may want to undo this operation. Luckily, our original predictor is untouched! .. code:: python predictor.leaderboard(test_data, silent=True) .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 RandomForestGini 0.842870 0.84 0.140122 0.055894 1.080855 0.140122 0.055894 1.080855 1 True 5
1 CatBoost 0.842461 0.85 0.011801 0.005573 1.403570 0.011801 0.005573 1.403570 1 True 7
2 RandomForestEntr 0.841130 0.83 0.139719 0.060857 1.060027 0.139719 0.060857 1.060027 1 True 6
3 LightGBM 0.839799 0.85 0.016043 0.008039 0.824368 0.016043 0.008039 0.824368 1 True 4
4 XGBoost 0.837445 0.87 0.049586 0.007187 0.261149 0.049586 0.007187 0.261149 1 True 11
5 WeightedEnsemble_L2 0.837445 0.87 0.052166 0.007834 0.583509 0.002579 0.000648 0.322360 2 True 14
6 LightGBMXT 0.836421 0.83 0.010703 0.005912 1.248788 0.010703 0.005912 1.248788 1 True 3
7 ExtraTreesGini 0.834579 0.82 0.140917 0.060351 1.065567 0.140917 0.060351 1.065567 1 True 8
8 NeuralNetTorch 0.833555 0.83 0.060173 0.013697 1.024997 0.060173 0.013697 1.024997 1 True 12
9 ExtraTreesEntr 0.833350 0.81 0.139170 0.058261 1.058253 0.139170 0.058261 1.058253 1 True 9
10 LightGBMLarge 0.828949 0.83 0.034843 0.005726 0.544085 0.034843 0.005726 0.544085 1 True 13
11 NeuralNetFastAI 0.818610 0.82 0.160457 0.013950 2.614331 0.160457 0.013950 2.614331 1 True 10
12 KNeighborsUnif 0.725970 0.73 0.015760 0.008520 0.609989 0.015760 0.008520 0.609989 1 True 1
13 KNeighborsDist 0.695158 0.65 0.024812 0.006325 0.603475 0.024812 0.006325 0.603475 1 True 2
We can simply clone a new predictor from our original, and we will no longer be impacted by the call to refit_full on the prior clone. Snapshot a deployment optimized Predictor via .clone_for_deployment() --------------------------------------------------------------------- Instead of cloning an exact copy, we can instead clone a copy which has the minimal set of artifacts needed to do prediction. Note that this optimized clone will have very limited functionality outside of calling predict and predict_proba. For example, it will be unable to train more models. .. code:: python save_path_clone_opt = save_path + '-clone-opt' # will return the path to the cloned predictor, identical to save_path_clone_opt path_clone_opt = predictor.clone_for_deployment(path=save_path_clone_opt) .. parsed-literal:: :class: output Cloned TabularPredictor located in 'agModels-predictClass-deployment/' to 'agModels-predictClass-deployment-clone-opt'. To load the cloned predictor: predictor_clone = TabularPredictor.load(path="agModels-predictClass-deployment-clone-opt") Clone: Keeping minimum set of models required to predict with best model 'WeightedEnsemble_L2'... Deleting model KNeighborsUnif. All files under agModels-predictClass-deployment-clone-opt/models/KNeighborsUnif/ will be removed. Deleting model KNeighborsDist. All files under agModels-predictClass-deployment-clone-opt/models/KNeighborsDist/ will be removed. Deleting model LightGBMXT. All files under agModels-predictClass-deployment-clone-opt/models/LightGBMXT/ will be removed. Deleting model LightGBM. All files under agModels-predictClass-deployment-clone-opt/models/LightGBM/ will be removed. Deleting model RandomForestGini. All files under agModels-predictClass-deployment-clone-opt/models/RandomForestGini/ will be removed. Deleting model RandomForestEntr. All files under agModels-predictClass-deployment-clone-opt/models/RandomForestEntr/ will be removed. Deleting model CatBoost. All files under agModels-predictClass-deployment-clone-opt/models/CatBoost/ will be removed. Deleting model ExtraTreesGini. All files under agModels-predictClass-deployment-clone-opt/models/ExtraTreesGini/ will be removed. Deleting model ExtraTreesEntr. All files under agModels-predictClass-deployment-clone-opt/models/ExtraTreesEntr/ will be removed. Deleting model NeuralNetFastAI. All files under agModels-predictClass-deployment-clone-opt/models/NeuralNetFastAI/ will be removed. Deleting model NeuralNetTorch. All files under agModels-predictClass-deployment-clone-opt/models/NeuralNetTorch/ will be removed. Deleting model LightGBMLarge. All files under agModels-predictClass-deployment-clone-opt/models/LightGBMLarge/ will be removed. Clone: Removing artifacts unnecessary for prediction. NOTE: Clone can no longer fit new models, and most functionality except for predict and predict_proba will no longer work .. code:: python predictor_clone_opt = TabularPredictor.load(path=path_clone_opt) We can see that the optimized clone still makes the same predictions: .. code:: python y_pred_clone_opt = predictor_clone_opt.predict(test_data) y_pred_clone_opt .. parsed-literal:: :class: output 0 <=50K 1 <=50K 2 <=50K 3 <=50K 4 <=50K ... 9764 <=50K 9765 <=50K 9766 <=50K 9767 <=50K 9768 <=50K Name: class, Length: 9769, dtype: object .. code:: python y_pred.equals(y_pred_clone_opt) .. parsed-literal:: :class: output True .. code:: python predictor_clone_opt.leaderboard(test_data, silent=True) .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 XGBoost 0.837445 0.87 0.038097 0.007187 0.261149 0.038097 0.007187 0.261149 1 True 1
1 WeightedEnsemble_L2 0.837445 0.87 0.040542 0.007834 0.583509 0.002445 0.000648 0.322360 2 True 2
We can check the disk usage of the optimized clone compared to the original: .. code:: python size_original = predictor.get_size_disk() size_opt = predictor_clone_opt.get_size_disk() print(f'Size Original: {size_original} bytes') print(f'Size Optimized: {size_opt} bytes') print(f'Optimized predictor achieved a {round((1 - (size_opt/size_original)) * 100, 1)}% reduction in disk usage.') .. parsed-literal:: :class: output Size Original: 16966478 bytes Size Optimized: 601220 bytes Optimized predictor achieved a 96.5% reduction in disk usage. We can also investigate the difference in the files that exist in the original and optimized predictor. Original: .. code:: python predictor.get_size_disk_per_file() .. parsed-literal:: :class: output models/ExtraTreesGini/model.pkl 4567890 models/ExtraTreesEntr/model.pkl 4530305 models/RandomForestGini/model.pkl 3076492 models/RandomForestEntr/model.pkl 2949158 models/XGBoost/xgb.ubj 564906 models/LightGBMLarge/model.pkl 470889 models/NeuralNetTorch/net.params 234610 models/NeuralNetFastAI/model-internals.pkl 167374 models/LightGBM/model.pkl 146038 models/LightGBMXT/model.pkl 42071 models/KNeighborsDist/model.pkl 39986 models/KNeighborsUnif/model.pkl 39985 utils/data/X.pkl 27655 models/CatBoost/model.pkl 21562 models/NeuralNetTorch/model.pkl 18149 learner.pkl 10719 metadata.json 8632 utils/data/X_val.pkl 8421 models/WeightedEnsemble_L2/model.pkl 8122 utils/data/y.pkl 7488 models/XGBoost/model.pkl 5475 models/trainer.pkl 5124 models/NeuralNetFastAI/model.pkl 3352 utils/data/y_val.pkl 2381 models/WeightedEnsemble_L2/utils/model_template.pkl 1024 models/WeightedEnsemble_L2/utils/oof.pkl 764 predictor.pkl 742 utils/attr/NeuralNetTorch/y_pred_proba_val.pkl 550 utils/attr/XGBoost/y_pred_proba_val.pkl 550 utils/attr/NeuralNetFastAI/y_pred_proba_val.pkl 550 utils/attr/ExtraTreesEntr/y_pred_proba_val.pkl 550 utils/attr/ExtraTreesGini/y_pred_proba_val.pkl 550 utils/attr/CatBoost/y_pred_proba_val.pkl 550 utils/attr/RandomForestEntr/y_pred_proba_val.pkl 550 utils/attr/RandomForestGini/y_pred_proba_val.pkl 550 utils/attr/LightGBM/y_pred_proba_val.pkl 550 utils/attr/LightGBMXT/y_pred_proba_val.pkl 550 utils/attr/KNeighborsDist/y_pred_proba_val.pkl 550 utils/attr/KNeighborsUnif/y_pred_proba_val.pkl 550 utils/attr/LightGBMLarge/y_pred_proba_val.pkl 550 __version__ 14 Name: size, dtype: int64 Optimized: .. code:: python predictor_clone_opt.get_size_disk_per_file() .. parsed-literal:: :class: output models/XGBoost/xgb.ubj 564906 learner.pkl 10719 metadata.json 8632 models/WeightedEnsemble_L2/model.pkl 8286 models/XGBoost/model.pkl 5495 models/trainer.pkl 2426 predictor.pkl 742 __version__ 14 Name: size, dtype: int64 Now all that is left is to upload the optimized predictor to a centralized storage location such as S3. To use this predictor in a new machine / system, simply download the artifact to local disk and load the predictor. Ensure that when loading a predictor you use the same Python version and AutoGluon version used during training to avoid instability.