.. _sec_tabularadvanced: Predicting Columns in a Table - In Depth ======================================== **Tip**: If you are new to AutoGluon, review :ref:`sec_tabularquick` to learn the basics of the AutoGluon API. To learn how to add your own custom models to the set that AutoGluon trains, tunes, and ensembles, review :ref:`sec_tabularcustommodel`. This tutorial describes how you can exert greater control when using AutoGluon's ``fit()`` or ``predict()``. Recall that to maximize predictive performance, you should always first try ``fit()`` with all default arguments except ``eval_metric`` and ``presets``, before you experiment with other arguments covered in this in-depth tutorial like ``hyperparameter_tune_kwargs``, ``hyperparameters``, ``num_stack_levels``, ``num_bag_folds``, ``num_bag_sets``, etc. Using the same census data table as in the :ref:`sec_tabularquick` tutorial, we'll now predict the ``occupation`` of an individual - a multiclass classification problem. Start by importing AutoGluon's TabularPredictor and TabularDataset, and loading the data. .. code:: python from autogluon.tabular import TabularDataset, TabularPredictor import numpy as np train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values train_data = train_data.sample(n=subsample_size, random_state=0) print(train_data.head()) label = 'occupation' print("Summary of occupation column: \n", train_data['occupation'].describe()) new_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv') test_data = new_data[5000:].copy() # this should be separate data in your applications y_test = test_data[label] test_data_nolabel = test_data.drop(columns=[label]) # delete label column val_data = new_data[:5000].copy() metric = 'accuracy' # we specify eval-metric just for demo (unnecessary as it's the default) .. parsed-literal:: :class: output age workclass fnlwgt education education-num \ 6118 51 Private 39264 Some-college 10 23204 58 Private 51662 10th 6 29590 40 Private 326310 Some-college 10 18116 37 Private 222450 HS-grad 9 33964 62 Private 109190 Bachelors 13 marital-status occupation relationship race sex \ 6118 Married-civ-spouse Exec-managerial Wife White Female 23204 Married-civ-spouse Other-service Wife White Female 29590 Married-civ-spouse Craft-repair Husband White Male 18116 Never-married Sales Not-in-family White Male 33964 Married-civ-spouse Exec-managerial Husband White Male capital-gain capital-loss hours-per-week native-country class 6118 0 0 40 United-States >50K 23204 0 0 8 United-States <=50K 29590 0 0 44 United-States <=50K 18116 0 2339 40 El-Salvador <=50K 33964 15024 0 40 United-States >50K Summary of occupation column: count 500 unique 15 top Exec-managerial freq 77 Name: occupation, dtype: object Specifying hyperparameters and tuning them ------------------------------------------ We first demonstrate hyperparameter-tuning and how you can provide your own validation dataset that AutoGluon internally relies on to: tune hyperparameters, early-stop iterative training, and construct model ensembles. One reason you may specify validation data is when future test data will stem from a different distribution than training data (and your specified validation data is more representative of the future data that will likely be encountered). If you don't have a strong reason to provide your own validation dataset, we recommend you omit the ``tuning_data`` argument. This lets AutoGluon automatically select validation data from your provided training set (it uses smart strategies such as stratified sampling). For greater control, you can specify the ``holdout_frac`` argument to tell AutoGluon what fraction of the provided training data to hold out for validation. **Caution:** Since AutoGluon tunes internal knobs based on this validation data, performance estimates reported on this data may be over-optimistic. For unbiased performance estimates, you should always call ``predict()`` on a separate dataset (that was never passed to ``fit()``), as we did in the previous **Quick-Start** tutorial. We also emphasize that most options specified in this tutorial are chosen to minimize runtime for the purposes of demonstration and you should select more reasonable values in order to obtain high-quality models. ``fit()`` trains neural networks and various types of tree ensembles by default. You can specify various hyperparameter values for each type of model. For each hyperparameter, you can either specify a single fixed value, or a search space of values to consider during hyperparameter optimization. Hyperparameters which you do not specify are left at default settings chosen automatically by AutoGluon, which may be fixed values or search spaces. .. code:: python import autogluon.core as ag nn_options = { # specifies non-default hyperparameter values for neural network models 'num_epochs': 10, # number of training epochs (controls training time of NN models) 'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale) 'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry) 'layers': ag.space.Categorical([100], [1000], [200, 100], [300, 200, 100]), # each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use 'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter) } gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees 'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models) 'num_leaves': ag.space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter) } hyperparameters = { # hyperparameters of each model type 'GBM': gbm_options, 'NN': nn_options, # NOTE: comment this line out if you get errors on Mac OSX } # When these keys are missing from hyperparameters dict, no models of that type are trained time_limit = 2*60 # train various models for ~2 min num_trials = 5 # try at most 5 different hyperparameter configurations for each type of model search_strategy = 'auto' # to tune hyperparameters using Bayesian optimization routine with a local scheduler hyperparameter_tune_kwargs = { # HPO is not performed unless hyperparameter_tune_kwargs is specified 'num_trials': num_trials, 'scheduler' : 'local', 'searcher': search_strategy, } predictor = TabularPredictor(label=label, eval_metric=metric).fit( train_data, tuning_data=val_data, time_limit=time_limit, hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs, ) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20210827_220955/" Warning: hyperparameter tuning is currently experimental and may cause the process to hang. Beginning AutoGluon training ... Time limit = 120s AutoGluon will save models to "AutogluonModels/ag-20210827_220955/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Tuning Data Rows: 5000 Tuning Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 22173.29 MB Train Data (Original) Memory Usage: 3.11 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.3 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.11s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() Fitting 2 L1 models ... Hyperparameter tuning model: LightGBM ... .. parsed-literal:: :class: output 0%| | 0/5 [00:00`__. You'll often see performance improve if you specify ``num_bag_folds`` = 5-10, ``num_stack_levels`` = 1-3 in the call to ``fit()``, but this will increase training times and memory/disk usage. .. code:: python predictor = TabularPredictor(label=label, eval_metric=metric).fit(train_data, num_bag_folds=5, num_bag_sets=1, num_stack_levels=1, hyperparameters = {'NN': {'num_epochs': 2}, 'GBM': {'num_boost_round': 20}}, # last argument is just for quick demo here, omit it in real applications ) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20210827_221035/" Beginning AutoGluon training ... AutoGluon will save models to "AutogluonModels/ag-20210827_221035/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 21922.47 MB Train Data (Original) Memory Usage: 0.28 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() AutoGluon will fit 2 stack levels (L1 to L2) ... Fitting 2 L1 models ... Fitting model: LightGBM_BAG_L1 ... 0.3067 = Validation score (accuracy) 0.71s = Training runtime 0.03s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... 0.1002 = Validation score (accuracy) 1.6s = Training runtime 0.11s = Validation runtime Fitting model: WeightedEnsemble_L2 ... 0.3067 = Validation score (accuracy) 0.12s = Training runtime 0.0s = Validation runtime Fitting 2 L2 models ... Fitting model: LightGBM_BAG_L2 ... 0.2965 = Validation score (accuracy) 0.96s = Training runtime 0.03s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L2 ... 0.0777 = Validation score (accuracy) 2.0s = Training runtime 0.14s = Validation runtime Fitting model: WeightedEnsemble_L3 ... 0.2986 = Validation score (accuracy) 0.12s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 6.08s ... TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210827_221035/") You should not provide ``tuning_data`` when stacking/bagging, and instead provide all your available data as ``train_data`` (which AutoGluon will split in more intellgent ways). ``num_bag_sets`` controls how many times the k-fold bagging process is repeated to further reduce variance (increasing this may further boost accuracy but will substantially increase training times, inference latency, and memory/disk usage). Rather than manually searching for good bagging/stacking values yourself, AutoGluon will automatically select good values for you if you specify ``auto_stack`` instead: .. code:: python save_path = 'agModels-predictOccupation' # folder where to store trained models predictor = TabularPredictor(label=label, eval_metric=metric, path=save_path).fit( train_data, auto_stack=True, time_limit=30, hyperparameters={'NN': {'num_epochs': 2}, 'GBM': {'num_boost_round': 20}} # last 2 arguments are for quick demo, omit them in real applications ) .. parsed-literal:: :class: output Beginning AutoGluon training ... Time limit = 30s AutoGluon will save models to "agModels-predictOccupation/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 21921.32 MB Train Data (Original) Memory Usage: 0.28 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() Fitting 2 L1 models ... Fitting model: LightGBM_BAG_L1 ... Training model for up to 29.92s of the 29.91s of remaining time. 0.3067 = Validation score (accuracy) 0.72s = Training runtime 0.03s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 29.14s of the 29.14s of remaining time. 0.0961 = Validation score (accuracy) 1.59s = Training runtime 0.11s = Validation runtime Repeating k-fold bagging: 2/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 27.4s of the 27.4s of remaining time. 0.3149 = Validation score (accuracy) 1.49s = Training runtime 0.07s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 26.58s of the 26.58s of remaining time. 0.0818 = Validation score (accuracy) 3.07s = Training runtime 0.21s = Validation runtime Repeating k-fold bagging: 3/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 24.95s of the 24.95s of remaining time. 0.3292 = Validation score (accuracy) 2.25s = Training runtime 0.1s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 24.13s of the 24.13s of remaining time. 0.1104 = Validation score (accuracy) 4.67s = Training runtime 0.32s = Validation runtime Repeating k-fold bagging: 4/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 22.4s of the 22.39s of remaining time. 0.3108 = Validation score (accuracy) 2.96s = Training runtime 0.13s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 21.63s of the 21.63s of remaining time. 0.0961 = Validation score (accuracy) 6.36s = Training runtime 0.43s = Validation runtime Repeating k-fold bagging: 5/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 19.8s of the 19.8s of remaining time. 0.3129 = Validation score (accuracy) 3.77s = Training runtime 0.17s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 18.92s of the 18.92s of remaining time. 0.1063 = Validation score (accuracy) 7.96s = Training runtime 0.53s = Validation runtime Repeating k-fold bagging: 6/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 17.18s of the 17.18s of remaining time. 0.3108 = Validation score (accuracy) 4.5s = Training runtime 0.2s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 16.4s of the 16.4s of remaining time. 0.0941 = Validation score (accuracy) 9.65s = Training runtime 0.64s = Validation runtime Repeating k-fold bagging: 7/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 14.56s of the 14.56s of remaining time. 0.3088 = Validation score (accuracy) 5.24s = Training runtime 0.24s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 13.76s of the 13.76s of remaining time. 0.0961 = Validation score (accuracy) 11.25s = Training runtime 0.75s = Validation runtime Repeating k-fold bagging: 8/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 12.01s of the 12.01s of remaining time. 0.3067 = Validation score (accuracy) 5.97s = Training runtime 0.27s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 11.22s of the 11.22s of remaining time. 0.1391 = Validation score (accuracy) 13.07s = Training runtime 0.86s = Validation runtime Repeating k-fold bagging: 9/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 9.25s of the 9.25s of remaining time. 0.3047 = Validation score (accuracy) 6.67s = Training runtime 0.3s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 8.5s of the 8.5s of remaining time. 0.1288 = Validation score (accuracy) 14.76s = Training runtime 0.96s = Validation runtime Repeating k-fold bagging: 10/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 6.66s of the 6.66s of remaining time. 0.3027 = Validation score (accuracy) 7.38s = Training runtime 0.34s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 5.9s of the 5.9s of remaining time. 0.1329 = Validation score (accuracy) 16.36s = Training runtime 1.07s = Validation runtime Repeating k-fold bagging: 11/20 Fitting model: LightGBM_BAG_L1 ... Training model for up to 4.15s of the 4.15s of remaining time. 0.2965 = Validation score (accuracy) 8.13s = Training runtime 0.37s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 3.33s of the 3.33s of remaining time. 0.1391 = Validation score (accuracy) 18.05s = Training runtime 1.18s = Validation runtime Completed 11/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.92s of the 1.48s of remaining time. 0.3129 = Validation score (accuracy) 0.12s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 28.65s ... TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictOccupation/") Often stacking/bagging will produce superior accuracy than hyperparameter-tuning, but you may try combining both techniques (note: specifying ``presets='best_quality'`` in ``fit()`` simply sets ``auto_stack=True``). Prediction options (inference) ------------------------------ Even if you've started a new Python session since last calling ``fit()``, you can still load a previously trained predictor from disk: .. code:: python predictor = TabularPredictor.load(save_path) # `predictor.path` is another way to get the relative path needed to later load predictor. Above ``save_path`` is the same folder previously passed to ``TabularPredictor``, in which all the trained models have been saved. You can train easily models on one machine and deploy them on another. Simply copy the ``save_path`` folder to the new machine and specify its new path in ``TabularPredictor.load()``. To find out the required feature columns to make predictions, call ``predictor.features()``: .. code:: python predictor.features() .. parsed-literal:: :class: output ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'class'] We can make a prediction on an individual example rather than a full dataset: .. code:: python datapoint = test_data_nolabel.iloc[[0]] # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame print(datapoint) predictor.predict(datapoint) .. parsed-literal:: :class: output age workclass fnlwgt education education-num marital-status \ 5000 49 Private 259087 Some-college 10 Divorced relationship race sex capital-gain capital-loss \ 5000 Not-in-family White Female 0 0 hours-per-week native-country class 5000 40 United-States <=50K .. parsed-literal:: :class: output 5000 Exec-managerial Name: occupation, dtype: object To output predicted class probabilities instead of predicted classes, you can use: .. code:: python predictor.predict_proba(datapoint) # returns a DataFrame that shows which probability corresponds to which class .. raw:: html
? Adm-clerical Armed-Forces Craft-repair Exec-managerial Farming-fishing Handlers-cleaners Machine-op-inspct Other-service Priv-house-serv Prof-specialty Protective-serv Sales Tech-support Transport-moving
5000 0.072913 0.100069 0.0 0.093996 0.113463 0.068587 0.073104 0.075502 0.07881 0.0 0.08589 0.0 0.089495 0.067953 0.080218
By default, ``predict()`` and ``predict_proba()`` will utilize the model that AutoGluon thinks is most accurate, which is usually an ensemble of many individual models. Here's how to see which model this is: .. code:: python predictor.get_model_best() .. parsed-literal:: :class: output 'WeightedEnsemble_L2' We can instead specify a particular model to use for predictions (e.g. to reduce inference latency). Note that a 'model' in AutoGluon may refer to for example a single Neural Network, a bagged ensemble of many Neural Network copies trained on different training/validation splits, a weighted ensemble that aggregates the predictions of many other models, or a stacker model that operates on predictions output by other models. This is akin to viewing a Random Forest as one 'model' when it is in fact an ensemble of many decision trees. Before deciding which model to use, let's evaluate all of the models AutoGluon has previously trained on our test data: .. code:: python predictor.leaderboard(test_data, silent=True) .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.284336 0.296524 1.036051 0.371686 8.132285 1.036051 0.371686 8.132285 1 True 1
1 WeightedEnsemble_L2 0.278255 0.312883 25.593016 1.549069 26.303268 0.002474 0.000444 0.118473 2 True 3
2 NeuralNetMXNet_BAG_L1 0.147201 0.139059 24.554490 1.176939 18.052510 24.554490 1.176939 18.052510 1 True 2
The leaderboard shows each model's predictive performance on the test data (``score_test``) and validation data (``score_val``), as well as the time required to: produce predictions for the test data (``pred_time_val``), produce predictions on the validation data (``pred_time_val``), and train only this model (``fit_time``). Below, we show that a leaderboard can be produced without new data (just uses the data previously reserved for validation inside ``fit``) and can display extra information about each model: .. code:: python predictor.leaderboard(extra_info=True, silent=True) .. raw:: html
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order num_features ... child_model_type hyperparameters hyperparameters_fit ag_args_fit features child_hyperparameters child_hyperparameters_fit child_ag_args_fit ancestors descendants
0 WeightedEnsemble_L2 0.312883 1.549069 26.303268 0.000444 0.118473 2 True 3 24 ... GreedyWeightedEnsembleModel {'use_orig_features': False, 'max_base_models'... {} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [NeuralNetMXNet_BAG_L1_11, LightGBM_BAG_L1_9, ... {'ensemble_size': 100} {'ensemble_size': 5} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [NeuralNetMXNet_BAG_L1, LightGBM_BAG_L1] []
1 LightGBM_BAG_L1 0.296524 0.371686 8.132285 0.371686 8.132285 1 True 1 14 ... LGBModel {'use_orig_features': True, 'max_base_models':... {} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [hours-per-week, fnlwgt, education, capital-lo... {'num_boost_round': 20, 'num_threads': -1, 'le... {'num_boost_round': 12} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [] [WeightedEnsemble_L2]
2 NeuralNetMXNet_BAG_L1 0.139059 1.176939 18.052510 1.176939 18.052510 1 True 2 14 ... TabularNeuralNetModel {'use_orig_features': True, 'max_base_models':... {} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [hours-per-week, fnlwgt, education, capital-lo... {'num_epochs': 2, 'epochs_wo_improve': 20, 'se... {'num_epochs': 2} {'max_memory_usage_ratio': 1.0, 'max_time_limi... [] [WeightedEnsemble_L2]

3 rows × 29 columns

The expanded leaderboard shows properties like how many features are used by each model (``num_features``), which other models are ancestors whose predictions are required inputs for each model (``ancestors``), and how much memory each model and all its ancestors would occupy if simultaneously persisted (``memory_size_w_ancestors``). See the `leaderboard documentation <../../api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.leaderboard>`__ for full details. To show scores for other metrics, you can specify the ``extra_metrics`` argument when passing in ``test_data``: .. code:: python predictor.leaderboard(test_data, extra_metrics=['accuracy', 'balanced_accuracy', 'log_loss'], silent=True) .. raw:: html
model score_test accuracy balanced_accuracy log_loss score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.284336 0.284336 0.178100 -11.748779 0.296524 1.040092 0.371686 8.132285 1.040092 0.371686 8.132285 1 True 1
1 WeightedEnsemble_L2 0.278255 0.278255 0.172363 -11.740418 0.312883 23.332240 1.549069 26.303268 0.002620 0.000444 0.118473 2 True 3
2 NeuralNetMXNet_BAG_L1 0.147201 0.147201 0.077969 -11.765823 0.139059 22.289528 1.176939 18.052510 22.289528 1.176939 18.052510 1 True 2
Notice that ``log_loss`` scores are negative. This is because metrics in AutoGluon are always shown in ``higher_is_better`` form. This means that metrics such as ``log_loss`` and ``root_mean_squared_error`` will have their signs FLIPPED, and values will be negative. This is necessary to avoid the user needing to know the metric to understand if higher is better when looking at leaderboard. One additional caviat: It is possible that ``log_loss`` values can be ``-inf`` when computed via ``extra_metrics``. This is because the models were not optimized with ``log_loss`` in mind during training and may have prediction probabilities giving a class ``0`` (particularly common with K-Nearest-Neighbors models). Because ``log_loss`` gives infinite error when the correct class was given ``0`` probability, this results in a score of ``-inf``. It is therefore recommended that ``log_loss`` should not be used as a secondary metric to determine model quality. Either use ``log_loss`` as the ``eval_metric`` or avoid it altogether. Here's how to specify a particular model to use for prediction instead of AutoGluon's default model-choice: .. code:: python i = 0 # index of model to use model_to_use = predictor.get_model_names()[i] model_pred = predictor.predict(datapoint, model=model_to_use) print("Prediction from %s model: %s" % (model_to_use, model_pred.iloc[0])) .. parsed-literal:: :class: output Prediction from LightGBM_BAG_L1 model: Exec-managerial We can easily access various information about the trained predictor or a particular model: .. code:: python all_models = predictor.get_model_names() model_to_use = all_models[i] specific_model = predictor._trainer.load_model(model_to_use) # Objects defined below are dicts of various information (not printed here as they are quite large): model_info = specific_model.get_info() predictor_information = predictor.info() The ``predictor`` also remembers what metric predictions should be evaluated with, which can be done with ground truth labels as follows: .. code:: python y_pred_proba = predictor.predict_proba(test_data_nolabel) perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred_proba) .. parsed-literal:: :class: output Evaluation: accuracy on test data: 0.2782553994548123 Evaluations on test data: { "accuracy": 0.2782553994548123, "balanced_accuracy": 0.17236314122811364, "mcc": 0.18554959350068545 } Since the label columns remains in the ``test_data`` DataFrame, we can instead use the shorthand: .. code:: python perf = predictor.evaluate(test_data) .. parsed-literal:: :class: output Evaluation: accuracy on test data: 0.2782553994548123 Evaluations on test data: { "accuracy": 0.2782553994548123, "balanced_accuracy": 0.17236314122811364, "mcc": 0.18554959350068545 } Interpretability (feature importance) ------------------------------------- To better understand our trained predictor, we can estimate the overall importance of each feature: .. code:: python predictor.feature_importance(test_data) .. parsed-literal:: :class: output Computing feature importance via permutation shuffling for 14 features using 1000 rows with 3 shuffle sets... 280.47s = Expected runtime (93.49s per shuffle set) 215.07s = Actual runtime (Completed 3 of 3 shuffle sets) .. raw:: html
importance stddev p_value n p99_high p99_low
education-num 0.066667 0.010017 0.003721 3 0.124063 0.009270
workclass 0.040333 0.004619 0.002171 3 0.066800 0.013867
sex 0.032000 0.014000 0.029140 3 0.112222 -0.048222
hours-per-week 0.023333 0.007572 0.016678 3 0.066721 -0.020054
age 0.014000 0.017578 0.150872 3 0.114726 -0.086726
class 0.002667 0.008083 0.312683 3 0.048983 -0.043649
fnlwgt 0.001333 0.013577 0.440292 3 0.079131 -0.076464
marital-status 0.000000 0.001732 0.500000 3 0.009925 -0.009925
race 0.000000 0.000000 0.500000 3 0.000000 0.000000
native-country 0.000000 0.000000 0.500000 3 0.000000 0.000000
education -0.000333 0.002517 0.580064 3 0.014087 -0.014754
capital-loss -0.000333 0.000577 0.788675 3 0.002975 -0.003642
capital-gain -0.001000 0.001000 0.887298 3 0.004730 -0.006730
relationship -0.002000 0.004000 0.761116 3 0.020920 -0.024920
Computed via `permutation-shuffling `__, these feature importance scores quantify the drop in predictive performance (of the already trained predictor) when one column's values are randomly shuffled across rows. The top features in this list contribute most to AutoGluon's accuracy (for predicting when/if a patient will be readmitted to the hospital). Features with non-positive importance score hardly contribute to the predictor's accuracy, or may even be actively harmful to include in the data (consider removing these features from your data and calling ``fit`` again). These scores facilitate interpretability of the predictor's global behavior (which features it relies on for *all* predictions) rather than `local explanations `__ that only rationalize one particular prediction. Accelerating inference ---------------------- We describe multiple ways to reduce the time it takes for AutoGluon to produce predictions. Keeping models in memory ~~~~~~~~~~~~~~~~~~~~~~~~ By default, AutoGluon loads models into memory one at a time and only when they are needed for prediction. This strategy is robust for large stacked/bagged ensembles, but leads to slower prediction times. If you plan to repeatedly make predictions (e.g. on new datapoints one at a time rather than one large test dataset), you can first specify that all models required for inference should be loaded into memory as follows: .. code:: python predictor.persist_models() num_test = 20 preds = np.array(['']*num_test, dtype='object') for i in range(num_test): datapoint = test_data_nolabel.iloc[[i]] pred_numpy = predictor.predict(datapoint, as_pandas=False) preds[i] = pred_numpy[0] perf = predictor.evaluate_predictions(y_test[:num_test], preds, auxiliary_metrics=True) print("Predictions: ", preds) predictor.unpersist_models() # free memory by clearing models, future predict() calls will load models from disk .. parsed-literal:: :class: output Persisting 3 models in memory. Models will require 0.27% of memory. Evaluation: accuracy on test data: 0.25 Evaluations on test data: { "accuracy": 0.25, "balanced_accuracy": 0.3208333333333336, "mcc": 0.12935842095105549 } Unpersisted 3 models: ['NeuralNetMXNet_BAG_L1', 'WeightedEnsemble_L2', 'LightGBM_BAG_L1'] .. parsed-literal:: :class: output Predictions: [' Exec-managerial' ' Exec-managerial' ' Craft-repair' ' Adm-clerical' ' Sales' ' Exec-managerial' ' Exec-managerial' ' Sales' ' Exec-managerial' ' Adm-clerical' ' Other-service' ' Exec-managerial' ' Exec-managerial' ' Exec-managerial' ' Adm-clerical' ' ?' ' Craft-repair' ' Craft-repair' ' Exec-managerial' ' Craft-repair'] .. parsed-literal:: :class: output ['NeuralNetMXNet_BAG_L1', 'WeightedEnsemble_L2', 'LightGBM_BAG_L1'] You can alternatively specify a particular model to persist via the ``models`` argument of ``persist_models()``, or simply set ``models='all'`` to simultaneously load every single model that was trained during ``fit``. Using smaller ensemble or faster model for prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Without having to retrain any models, one can construct alternative ensembles that aggregate individual models' predictions with different weighting schemes. These ensembles become smaller (and hence faster for prediction) if they assign nonzero weight to less models. You can produce a wide variety of ensembles with different accuracy-speed tradeoffs like this: .. code:: python additional_ensembles = predictor.fit_weighted_ensemble(expand_pareto_frontier=True) print("Alternative ensembles you can use for prediction:", additional_ensembles) predictor.leaderboard(only_pareto_frontier=True, silent=True) .. parsed-literal:: :class: output Fitting model: WeightedEnsemble_L2Best ... 0.3129 = Validation score (accuracy) 0.12s = Training runtime 0.0s = Validation runtime .. parsed-literal:: :class: output Alternative ensembles you can use for prediction: ['WeightedEnsemble_L2Best'] .. raw:: html
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.312883 1.549069 26.303268 0.000444 0.118473 2 True 3
1 LightGBM_BAG_L1 0.296524 0.371686 8.132285 0.371686 8.132285 1 True 1
The resulting leaderboard will contain the most accurate model for a given inference-latency. You can select whichever model exhibits acceptable latency from the leaderboard and use it for prediction. .. code:: python model_for_prediction = additional_ensembles[0] predictions = predictor.predict(test_data, model=model_for_prediction) predictor.delete_models(models_to_delete=additional_ensembles, dry_run=False) # delete these extra models so they don't affect rest of tutorial .. parsed-literal:: :class: output Deleting model WeightedEnsemble_L2Best. All files under agModels-predictOccupation/models/WeightedEnsemble_L2Best/ will be removed. Collapsing bagged ensembles via refit\_full ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For an ensemble predictor trained with bagging (as done above), recall there ~10 bagged copies of each individual model trained on different train/validation folds. We can collapse this bag of ~10 models into a single model that's fit to the full dataset, which can greatly reduce its memory/latency requirements (but may also reduce accuracy). Below we refit such a model for each original model but you can alternatively do this for just a particular model by specifying the ``model`` argument of ``refit_full()``. .. code:: python refit_model_map = predictor.refit_full() print("Name of each refit-full model corresponding to a previous bagged ensemble:") print(refit_model_map) predictor.leaderboard(test_data, silent=True) .. parsed-literal:: :class: output Fitting 1 L1 models ... Fitting model: LightGBM_BAG_L1_FULL ... 0.13s = Training runtime Fitting 1 L1 models ... Fitting model: NeuralNetMXNet_BAG_L1_FULL ... 0.3s = Training runtime Fitting model: WeightedEnsemble_L2_FULL ... 0.3129 = Validation score (accuracy) 0.0s = Training runtime 0.0s = Validation runtime .. parsed-literal:: :class: output Name of each refit-full model corresponding to a previous bagged ensemble: {'LightGBM_BAG_L1': 'LightGBM_BAG_L1_FULL', 'NeuralNetMXNet_BAG_L1': 'NeuralNetMXNet_BAG_L1_FULL', 'WeightedEnsemble_L2': 'WeightedEnsemble_L2_FULL'} .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.284336 0.296524 1.044312 0.371686 8.132285 1.044312 0.371686 8.132285 1 True 1
1 WeightedEnsemble_L2 0.278255 0.312883 23.152946 1.549069 26.303268 0.002642 0.000444 0.118473 2 True 3
2 WeightedEnsemble_L2_FULL 0.272594 NaN 0.448019 NaN 0.424643 0.001966 0.000548 0.000314 2 True 6
3 LightGBM_BAG_L1_FULL 0.269868 NaN 0.020608 NaN 0.128227 0.020608 NaN 0.128227 1 True 4
4 NeuralNetMXNet_BAG_L1 0.147201 0.139059 22.105991 1.176939 18.052510 22.105991 1.176939 18.052510 1 True 2
5 NeuralNetMXNet_BAG_L1_FULL 0.052632 NaN 0.425445 NaN 0.296102 0.425445 NaN 0.296102 1 True 5
This adds the refit-full models to the leaderboard and we can opt to use any of them for prediction just like any other model. Note ``pred_time_test`` and ``pred_time_val`` list the time taken to produce predictions with each model (in seconds) on the test/validation data. Since the refit-full models were trained using all of the data, there is no internal validation score (``score_val``) available for them. You can also call ``refit_full()`` with non-bagged models to refit the same models to your full dataset (there won't be memory/latency gains in this case but test accuracy may improve). Model distillation ~~~~~~~~~~~~~~~~~~ While computationally-favorable, single individual models will usually have lower accuracy than weighted/stacked/bagged ensembles. `Model Distillation `__ offers one way to retain the computational benefits of a single model, while enjoying some of the accuracy-boost that comes with ensembling. The idea is to train the individual model (which we can call the student) to mimic the predictions of the full stack ensemble (the teacher). Like ``refit_full()``, the ``distill()`` function will produce additional models we can opt to use for prediction. .. code:: python student_models = predictor.distill(time_limit=30) # specify much longer time limit in real applications print(student_models) preds_student = predictor.predict(test_data_nolabel, model=student_models[0]) print(f"predictions from {student_models[0]}:", list(preds_student)[:5]) predictor.leaderboard(test_data) .. parsed-literal:: :class: output Distilling with teacher='WeightedEnsemble_L2', teacher_preds=soft, augment_method=spunge ... SPUNGE: Augmenting training data with 1955 synthetic samples for distillation... Distilling with each of these student models: ['LightGBM_DSTL', 'NeuralNetMXNet_DSTL', 'RandomForestMSE_DSTL', 'CatBoost_DSTL'] Fitting 4 L1 models ... Fitting model: LightGBM_DSTL ... Training model for up to 30.0s of the 30.0s of remaining time. Note: model has different eval_metric than default. -2.1788 = Validation score (soft_log_loss) 3.6s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetMXNet_DSTL ... Training model for up to 26.3s of the 26.29s of remaining time. Note: model has different eval_metric than default. -2.2322 = Validation score (soft_log_loss) 6.96s = Training runtime 0.02s = Validation runtime Fitting model: RandomForestMSE_DSTL ... Training model for up to 19.31s of the 19.3s of remaining time. Note: model has different eval_metric than default. -2.2067 = Validation score (soft_log_loss) 1.01s = Training runtime 0.11s = Validation runtime Fitting model: CatBoost_DSTL ... Training model for up to 18.04s of the 18.04s of remaining time. Warning: Exception caused CatBoost_DSTL to fail during training (ImportError)... Skipping this model. `import catboost_dev` failed (needed for distillation with CatBoost models). Make sure you can import catboost and then run: 'pip install catboost-dev'.Detailed info: No module named 'catboost_dev' Repeating k-fold bagging: 2/20 Repeating k-fold bagging: 3/20 Repeating k-fold bagging: 4/20 Repeating k-fold bagging: 5/20 Repeating k-fold bagging: 6/20 Repeating k-fold bagging: 7/20 Repeating k-fold bagging: 8/20 Repeating k-fold bagging: 9/20 Repeating k-fold bagging: 10/20 Repeating k-fold bagging: 11/20 Repeating k-fold bagging: 12/20 Repeating k-fold bagging: 13/20 Repeating k-fold bagging: 14/20 Repeating k-fold bagging: 15/20 Repeating k-fold bagging: 16/20 Repeating k-fold bagging: 17/20 Repeating k-fold bagging: 18/20 Repeating k-fold bagging: 19/20 Repeating k-fold bagging: 20/20 Completed 20/20 k-fold bagging repeats ... Distilling with each of these student models: ['WeightedEnsemble_L2_DSTL'] Fitting model: WeightedEnsemble_L2_DSTL ... Training model for up to 30.0s of the 17.69s of remaining time. Note: model has different eval_metric than default. -2.1766 = Validation score (soft_log_loss) 0.31s = Training runtime 0.0s = Validation runtime Distilled model leaderboard: model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 NeuralNetMXNet_DSTL 0.326531 0.021733 6.958935 0.021733 6.958935 1 True 8 1 WeightedEnsemble_L2_DSTL 0.326531 0.122857 4.929064 0.001197 0.314034 2 True 10 2 LightGBM_DSTL 0.316327 0.013802 3.601163 0.013802 3.601163 1 True 7 3 RandomForestMSE_DSTL 0.295918 0.107858 1.013867 0.107858 1.013867 1 True 9 .. parsed-literal:: :class: output ['LightGBM_DSTL', 'NeuralNetMXNet_DSTL', 'RandomForestMSE_DSTL', 'WeightedEnsemble_L2_DSTL'] predictions from LightGBM_DSTL: [' Exec-managerial', ' Exec-managerial', ' Craft-repair', ' Other-service', ' Transport-moving'] model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 WeightedEnsemble_L2_DSTL 0.290208 0.326531 0.508158 0.122857 4.929064 0.002178 0.001197 0.314034 2 True 10 1 RandomForestMSE_DSTL 0.285175 0.295918 0.198313 0.107858 1.013867 0.198313 0.107858 1.013867 1 True 9 2 LightGBM_BAG_L1 0.284336 0.296524 1.047932 0.371686 8.132285 1.047932 0.371686 8.132285 1 True 1 3 NeuralNetMXNet_DSTL 0.283288 0.326531 0.352077 0.021733 6.958935 0.352077 0.021733 6.958935 1 True 8 4 LightGBM_DSTL 0.282659 0.316327 0.307667 0.013802 3.601163 0.307667 0.013802 3.601163 1 True 7 5 WeightedEnsemble_L2 0.278255 0.312883 23.227982 1.549069 26.303268 0.005413 0.000444 0.118473 2 True 3 6 WeightedEnsemble_L2_FULL 0.272594 NaN 0.452775 NaN 0.424643 0.001783 0.000548 0.000314 2 True 6 7 LightGBM_BAG_L1_FULL 0.269868 NaN 0.020751 NaN 0.128227 0.020751 NaN 0.128227 1 True 4 8 NeuralNetMXNet_BAG_L1 0.147201 0.139059 22.174637 1.176939 18.052510 22.174637 1.176939 18.052510 1 True 2 9 NeuralNetMXNet_BAG_L1_FULL 0.052632 NaN 0.430241 NaN 0.296102 0.430241 NaN 0.296102 1 True 5 .. raw:: html
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2_DSTL 0.290208 0.326531 0.508158 0.122857 4.929064 0.002178 0.001197 0.314034 2 True 10
1 RandomForestMSE_DSTL 0.285175 0.295918 0.198313 0.107858 1.013867 0.198313 0.107858 1.013867 1 True 9
2 LightGBM_BAG_L1 0.284336 0.296524 1.047932 0.371686 8.132285 1.047932 0.371686 8.132285 1 True 1
3 NeuralNetMXNet_DSTL 0.283288 0.326531 0.352077 0.021733 6.958935 0.352077 0.021733 6.958935 1 True 8
4 LightGBM_DSTL 0.282659 0.316327 0.307667 0.013802 3.601163 0.307667 0.013802 3.601163 1 True 7
5 WeightedEnsemble_L2 0.278255 0.312883 23.227982 1.549069 26.303268 0.005413 0.000444 0.118473 2 True 3
6 WeightedEnsemble_L2_FULL 0.272594 NaN 0.452775 NaN 0.424643 0.001783 0.000548 0.000314 2 True 6
7 LightGBM_BAG_L1_FULL 0.269868 NaN 0.020751 NaN 0.128227 0.020751 NaN 0.128227 1 True 4
8 NeuralNetMXNet_BAG_L1 0.147201 0.139059 22.174637 1.176939 18.052510 22.174637 1.176939 18.052510 1 True 2
9 NeuralNetMXNet_BAG_L1_FULL 0.052632 NaN 0.430241 NaN 0.296102 0.430241 NaN 0.296102 1 True 5
Faster presets or hyperparameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of trying to speed up a cumbersome trained model at prediction time, if you know inference latency or memory will be an issue at the outset, then you can adjust the training process accordingly to ensure ``fit()`` does not produce unwieldy models. One option is to specify more lightweight ``presets``: .. code:: python presets = ['good_quality_faster_inference_only_refit', 'optimize_for_deployment'] predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, presets=presets, time_limit=30) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20210827_221902/" Presets specified: ['good_quality_faster_inference_only_refit', 'optimize_for_deployment'] Beginning AutoGluon training ... Time limit = 30s AutoGluon will save models to "AutogluonModels/ag-20210827_221902/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 21737.81 MB Train Data (Original) Memory Usage: 0.28 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() Fitting 11 L1 models ... Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 29.92s of the 29.92s of remaining time. Ran out of time, stopping training early. No improvement since epoch 6: early stopping No improvement since epoch 7: early stopping 0.2843 = Validation score (accuracy) 7.52s = Training runtime 0.08s = Validation runtime Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 22.31s of the 22.31s of remaining time. 0.3558 = Validation score (accuracy) 3.14s = Training runtime 0.05s = Validation runtime Fitting model: LightGBM_BAG_L1 ... Training model for up to 19.1s of the 19.1s of remaining time. 0.3395 = Validation score (accuracy) 3.48s = Training runtime 0.04s = Validation runtime Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 15.57s of the 15.57s of remaining time. 0.3047 = Validation score (accuracy) 0.71s = Training runtime 0.11s = Validation runtime Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 14.75s of the 14.75s of remaining time. 0.2945 = Validation score (accuracy) 0.71s = Training runtime 0.11s = Validation runtime Fitting model: CatBoost_BAG_L1 ... Training model for up to 13.92s of the 13.92s of remaining time. Time limit exceeded... Skipping CatBoost_BAG_L1. Fitting model: ExtraTreesGini_BAG_L1 ... Training model for up to 6.31s of the 6.31s of remaining time. Warning: Reducing model 'n_estimators' from 300 -> 243 due to low time. Expected time usage reduced from 7.8s -> 6.3s... 0.3088 = Validation score (accuracy) 0.59s = Training runtime 0.09s = Validation runtime Fitting model: ExtraTreesEntr_BAG_L1 ... Training model for up to 5.62s of the 5.62s of remaining time. Warning: Reducing model 'n_estimators' from 300 -> 217 due to low time. Expected time usage reduced from 7.8s -> 5.6s... 0.317 = Validation score (accuracy) 0.48s = Training runtime 0.08s = Validation runtime Fitting model: XGBoost_BAG_L1 ... Training model for up to 5.05s of the 5.05s of remaining time. 0.3497 = Validation score (accuracy) 3.98s = Training runtime 0.03s = Validation runtime Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 1.02s of the 1.02s of remaining time. Time limit exceeded... Skipping NeuralNetMXNet_BAG_L1. Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 0.62s of the 0.62s of remaining time. Ran out of time, early stopping on iteration 1. Best iteration is: [1] train_set's multi_error: 0.7289 valid_set's multi_error: 0.816327 Time limit exceeded... Skipping LightGBMLarge_BAG_L1. Completed 1/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.92s of the 0.45s of remaining time. 0.3558 = Validation score (accuracy) 0.28s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 29.84s ... Fitting 1 L1 models ... Fitting model: LightGBMXT_BAG_L1_FULL ... 0.39s = Training runtime Deleting model NeuralNetFastAI_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/NeuralNetFastAI_BAG_L1/ will be removed. Deleting model LightGBMXT_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/LightGBMXT_BAG_L1/ will be removed. Deleting model LightGBM_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/LightGBM_BAG_L1/ will be removed. Deleting model RandomForestGini_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/RandomForestGini_BAG_L1/ will be removed. Deleting model RandomForestEntr_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/RandomForestEntr_BAG_L1/ will be removed. Deleting model ExtraTreesGini_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/ExtraTreesGini_BAG_L1/ will be removed. Deleting model ExtraTreesEntr_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/ExtraTreesEntr_BAG_L1/ will be removed. Deleting model XGBoost_BAG_L1. All files under AutogluonModels/ag-20210827_221902/models/XGBoost_BAG_L1/ will be removed. Deleting model WeightedEnsemble_L2. All files under AutogluonModels/ag-20210827_221902/models/WeightedEnsemble_L2/ will be removed. TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210827_221902/") Another option is to specify more lightweight hyperparameters: .. code:: python predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, hyperparameters='very_light', time_limit=30) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20210827_221933/" Beginning AutoGluon training ... Time limit = 30s AutoGluon will save models to "AutogluonModels/ag-20210827_221933/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 18467.25 MB Train Data (Original) Memory Usage: 0.28 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 391, Val Rows: 98 Fitting 6 L1 models ... Fitting model: NeuralNetFastAI ... Training model for up to 29.92s of the 29.92s of remaining time. No improvement since epoch 8: early stopping 0.2551 = Validation score (accuracy) 0.55s = Training runtime 0.02s = Validation runtime Fitting model: LightGBM ... Training model for up to 29.34s of the 29.34s of remaining time. 0.3673 = Validation score (accuracy) 0.74s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMXT ... Training model for up to 28.58s of the 28.58s of remaining time. 0.3673 = Validation score (accuracy) 0.52s = Training runtime 0.01s = Validation runtime Fitting model: CatBoost ... Training model for up to 28.04s of the 28.04s of remaining time. 0.3571 = Validation score (accuracy) 8.3s = Training runtime 0.01s = Validation runtime Fitting model: XGBoost ... Training model for up to 19.72s of the 19.72s of remaining time. 0.398 = Validation score (accuracy) 0.78s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetMXNet ... Training model for up to 18.83s of the 18.83s of remaining time. 0.3469 = Validation score (accuracy) 5.51s = Training runtime 0.02s = Validation runtime Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.92s of the 13.03s of remaining time. 0.4082 = Validation score (accuracy) 0.15s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 17.14s ... TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210827_221933/") Here you can set ``hyperparameters`` to either 'light', 'very\_light', or 'toy' to obtain progressively smaller (but less accurate) models and predictors. Advanced users may instead try manually specifying particular models' hyperparameters in order to make them faster/smaller. Finally, you may also exclude specific unwieldy models from being trained at all. Below we exclude models that tend to be slower (K Nearest Neighbors, Neural Network, models with custom larger-than-default hyperparameters): .. code:: python excluded_model_types = ['KNN', 'NN', 'custom'] predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, excluded_model_types=excluded_model_types, time_limit=30) .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20210827_221950/" Beginning AutoGluon training ... Time limit = 30s AutoGluon will save models to "AutogluonModels/ag-20210827_221950/" AutoGluon Version: 0.3.0b20210827 Train Data Rows: 500 Train Data Columns: 14 Preprocessing data ... AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object). First 10 (of 15) unique label values: [' Exec-managerial', ' Other-service', ' Craft-repair', ' Sales', ' Prof-specialty', ' Protective-serv', ' ?', ' Adm-clerical', ' Machine-op-inspct', ' Tech-support'] If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 15 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.978 Train Data Class Count: 12 Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 18442.33 MB Train Data (Original) Memory Usage: 0.28 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 8 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric argument of fit() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 391, Val Rows: 98 Excluded Model Types: ['KNN', 'NN', 'custom'] Found 'NN' model in hyperparameters, but 'NN' is present in `excluded_model_types` and will be removed. Found 'KNN' model in hyperparameters, but 'KNN' is present in `excluded_model_types` and will be removed. Found 'KNN' model in hyperparameters, but 'KNN' is present in `excluded_model_types` and will be removed. Fitting 10 L1 models ... Fitting model: NeuralNetFastAI ... Training model for up to 29.92s of the 29.91s of remaining time. No improvement since epoch 8: early stopping 0.2653 = Validation score (accuracy) 0.55s = Training runtime 0.02s = Validation runtime Fitting model: LightGBMXT ... Training model for up to 29.34s of the 29.34s of remaining time. 0.3673 = Validation score (accuracy) 0.54s = Training runtime 0.01s = Validation runtime Fitting model: LightGBM ... Training model for up to 28.78s of the 28.78s of remaining time. 0.3673 = Validation score (accuracy) 0.72s = Training runtime 0.01s = Validation runtime Fitting model: RandomForestGini ... Training model for up to 28.04s of the 28.04s of remaining time. 0.3163 = Validation score (accuracy) 0.71s = Training runtime 0.11s = Validation runtime Fitting model: RandomForestEntr ... Training model for up to 27.2s of the 27.2s of remaining time. 0.2857 = Validation score (accuracy) 0.61s = Training runtime 0.11s = Validation runtime Fitting model: CatBoost ... Training model for up to 26.46s of the 26.46s of remaining time. 0.3571 = Validation score (accuracy) 8.31s = Training runtime 0.01s = Validation runtime Fitting model: ExtraTreesGini ... Training model for up to 18.14s of the 18.13s of remaining time. 0.2857 = Validation score (accuracy) 0.71s = Training runtime 0.11s = Validation runtime Fitting model: ExtraTreesEntr ... Training model for up to 17.29s of the 17.29s of remaining time. 0.2653 = Validation score (accuracy) 0.71s = Training runtime 0.11s = Validation runtime Fitting model: XGBoost ... Training model for up to 16.45s of the 16.45s of remaining time. 0.398 = Validation score (accuracy) 0.75s = Training runtime 0.01s = Validation runtime Fitting model: LightGBMLarge ... Training model for up to 15.59s of the 15.59s of remaining time. 0.3163 = Validation score (accuracy) 3.06s = Training runtime 0.01s = Validation runtime Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.92s of the 11.64s of remaining time. 0.398 = Validation score (accuracy) 0.22s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 18.6s ... TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210827_221950/") If you encounter memory issues ------------------------------ To reduce memory usage during training, you may try each of the following strategies individually or combinations of them (these may harm accuracy): - In ``fit()``, set ``num_bag_sets = 1`` (can also try values greater than 1 to harm accuracy less). - In ``fit()``, set ``excluded_model_types = ['KNN', 'XT' ,'RF']`` (or some subset of these models). - Try different ``presets`` in ``fit()``. - In ``fit()``, set ``hyperparameters = 'light'`` or ``hyperparameters = 'very_light'``. - Text fields in your table require substantial memory for N-gram featurization. To mitigate this in ``fit()``, you can either: (1) add ``'ignore_text'`` to your ``presets`` list (to ignore text features), or (2) specify the argument: :: from sklearn.feature_extraction.text import CountVectorizer from autogluon.features.generators import AutoMLPipelineFeatureGenerator feature_generator = AutoMLPipelineFeatureGenerator(vectorizer=CountVectorizer(min_df=30, ngram_range=(1, 3), max_features=MAX_NGRAM, dtype=np.uint8)) where ``MAX_NGRAM = 1000`` say (try various values under 10000 to reduce the number of N-gram features used to represent each text field) In addition to reducing memory usage, many of the above strategies can also be used to reduce training times. To reduce memory usage during inference: - If trying to produce predictions for a large test dataset, break the test data into smaller chunks as demonstrated in :ref:`sec_faq`. - If models have been previously persisted in memory but inference-speed is not a major concern, call ``predictor.unpersist_models()``. - If models have been previously persisted in memory, bagging was used in ``fit()``, and inference-speed is a concern: call ``predictor.refit_full()`` and use one of the refit-full models for prediction (ensure this is the only model persisted in memory). If you encounter disk space issues ---------------------------------- To reduce disk usage, you may try each of the following strategies individually or combinations of them: - Make sure to delete all ``predictor.path`` folders from previous ``fit()`` runs! These can eat up your free space if you call ``fit()`` many times. If you didn't specify ``path``, AutoGluon still automatically saved its models to a folder called: "AutogluonModels/ag-[TIMESTAMP]", where TIMESTAMP records when ``fit()`` was called, so make sure to also delete these folders if you run low on free space. - Call ``predictor.save_space()`` to delete auxiliary files produced during ``fit()``. - Call ``predictor.delete_models(models_to_keep='best', dry_run=False)`` if you only intend to use this predictor for inference going forward (will delete files required for non-prediction-related functionality like ``fit_summary``). - In ``fit()``, you can add ``'optimize_for_deployment'`` to the ``presets`` list, which will automatically invoke the previous two strategies after training. - Most of the above strategies to reduce memory usage will also reduce disk usage (but may harm accuracy). References ---------- The following paper describes how AutoGluon internally operates on tabular data: Erickson et al. `AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data `__. *Arxiv*, 2020.