Predicting Columns in a Table - Quick Start¶
Via a simple fit()
call, AutoGluon can produce highly-accurate
models to predict the values in one column of a data table based on the
rest of the columns’ values. Use AutoGluon with tabular data for both
classification and regression problems. This tutorial demonstrates how
to use AutoGluon to produce a classification model that predicts whether
or not a person’s income exceeds $50,000.
To start, import AutoGluon’s TabularPredictor and TabularDataset classes:
from autogluon.tabular import TabularDataset, TabularPredictor
Load training data from a CSV file into an AutoGluon Dataset object. This object is essentially equivalent to a Pandas DataFrame and the same methods can be applied to both.
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6118 | 51 | Private | 39264 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States | >50K |
23204 | 58 | Private | 51662 | 10th | 6 | Married-civ-spouse | Other-service | Wife | White | Female | 0 | 0 | 8 | United-States | <=50K |
29590 | 40 | Private | 326310 | Some-college | 10 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 44 | United-States | <=50K |
18116 | 37 | Private | 222450 | HS-grad | 9 | Never-married | Sales | Not-in-family | White | Male | 0 | 2339 | 40 | El-Salvador | <=50K |
33964 | 62 | Private | 109190 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 15024 | 0 | 40 | United-States | >50K |
Note that we loaded data from a CSV file stored in the cloud (AWS s3
bucket), but you can you specify a local
file-path instead if you have already downloaded the CSV file to your
own machine (e.g., using wget).
Each row in the table train_data
corresponds to a single training
example. In this particular dataset, each row corresponds to an
individual person, and the columns contain various characteristics
reported during a census.
Let’s first use these features to predict whether the person’s income
exceeds $50,000 or not, which is recorded in the class
column of
this table.
label = 'class'
print("Summary of class variable: \n", train_data[label].describe())
Summary of class variable:
count 500
unique 2
top <=50K
freq 365
Name: class, dtype: object
Now use AutoGluon to train multiple models:
save_path = 'agModels-predictClass' # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path).fit(train_data)
Beginning AutoGluon training ...
AutoGluon will save models to "agModels-predictClass/"
AutoGluon Version: 0.6.1b20221213
Python Version: 3.8.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Tue Nov 30 00:17:50 UTC 2021
Train Data Rows: 500
Train Data Columns: 14
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 31473.95 MB
Train Data (Original) Memory Usage: 0.29 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ...
0.73 = Validation score (accuracy)
0.61s = Training runtime
0.01s = Validation runtime
Fitting model: KNeighborsDist ...
0.65 = Validation score (accuracy)
0.6s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBMXT ...
0.83 = Validation score (accuracy)
1.24s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBM ...
0.85 = Validation score (accuracy)
0.83s = Training runtime
0.01s = Validation runtime
Fitting model: RandomForestGini ...
0.84 = Validation score (accuracy)
1.08s = Training runtime
0.06s = Validation runtime
Fitting model: RandomForestEntr ...
0.83 = Validation score (accuracy)
1.06s = Training runtime
0.06s = Validation runtime
Fitting model: CatBoost ...
0.85 = Validation score (accuracy)
1.42s = Training runtime
0.01s = Validation runtime
Fitting model: ExtraTreesGini ...
0.82 = Validation score (accuracy)
1.07s = Training runtime
0.06s = Validation runtime
Fitting model: ExtraTreesEntr ...
0.81 = Validation score (accuracy)
1.07s = Training runtime
0.06s = Validation runtime
Fitting model: NeuralNetFastAI ...
0.82 = Validation score (accuracy)
2.6s = Training runtime
0.01s = Validation runtime
Fitting model: XGBoost ...
0.87 = Validation score (accuracy)
0.23s = Training runtime
0.01s = Validation runtime
Fitting model: NeuralNetTorch ...
0.83 = Validation score (accuracy)
1.01s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBMLarge ...
0.83 = Validation score (accuracy)
0.52s = Training runtime
0.01s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
0.87 = Validation score (accuracy)
0.33s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 14.2s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictClass/")
Next, load separate test data to demonstrate how to make predictions on new examples at inference time:
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label] # values to predict
test_data_nolab = test_data.drop(columns=[label]) # delete label column to prove we're not cheating
test_data_nolab.head()
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 31 | Private | 169085 | 11th | 7 | Married-civ-spouse | Sales | Wife | White | Female | 0 | 0 | 20 | United-States |
1 | 17 | Self-emp-not-inc | 226203 | 12th | 8 | Never-married | Sales | Own-child | White | Male | 0 | 0 | 45 | United-States |
2 | 47 | Private | 54260 | Assoc-voc | 11 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1887 | 60 | United-States |
3 | 21 | Private | 176262 | Some-college | 10 | Never-married | Exec-managerial | Own-child | White | Female | 0 | 0 | 30 | United-States |
4 | 17 | Private | 241185 | 12th | 8 | Never-married | Prof-specialty | Own-child | White | Male | 0 | 0 | 20 | United-States |
We use our trained models to make predictions on the new data and then evaluate performance:
Warning
TabularPredictor.load()
used pickle
module implicitly, which
is known to be insecure. It is possible to construct malicious pickle
data which will execute arbitrary code during unpickling. Never load
data that could have come from an untrusted source, or that could
have been tampered with. Only load data you trust.
predictor = TabularPredictor.load(save_path) # unnecessary, just demonstrates how to load previously-trained predictor from file
y_pred = predictor.predict(test_data_nolab)
print("Predictions: \n", y_pred)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)
Evaluation: accuracy on test data: 0.8374449790152523
Evaluations on test data:
{
"accuracy": 0.8374449790152523,
"balanced_accuracy": 0.7430558394221018,
"mcc": 0.5243657567117436,
"f1": 0.621904761904762,
"precision": 0.69394261424017,
"recall": 0.5634167385677308
}
Predictions:
0 <=50K
1 <=50K
2 <=50K
3 <=50K
4 <=50K
...
9764 <=50K
9765 <=50K
9766 <=50K
9767 <=50K
9768 <=50K
Name: class, Length: 9769, dtype: object
We can also evaluate the performance of each individual trained model on our (labeled) test data:
predictor.leaderboard(test_data, silent=True)
model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RandomForestGini | 0.842870 | 0.84 | 0.138816 | 0.058765 | 1.083881 | 0.138816 | 0.058765 | 1.083881 | 1 | True | 5 |
1 | CatBoost | 0.842461 | 0.85 | 0.012542 | 0.005373 | 1.416180 | 0.012542 | 0.005373 | 1.416180 | 1 | True | 7 |
2 | RandomForestEntr | 0.841130 | 0.83 | 0.139030 | 0.057517 | 1.057551 | 0.139030 | 0.057517 | 1.057551 | 1 | True | 6 |
3 | LightGBM | 0.839799 | 0.85 | 0.014895 | 0.005560 | 0.827884 | 0.014895 | 0.005560 | 0.827884 | 1 | True | 4 |
4 | XGBoost | 0.837445 | 0.87 | 0.049426 | 0.006450 | 0.229545 | 0.049426 | 0.006450 | 0.229545 | 1 | True | 11 |
5 | WeightedEnsemble_L2 | 0.837445 | 0.87 | 0.052001 | 0.007092 | 0.562734 | 0.002575 | 0.000642 | 0.333189 | 2 | True | 14 |
6 | LightGBMXT | 0.836421 | 0.83 | 0.010107 | 0.005689 | 1.240028 | 0.010107 | 0.005689 | 1.240028 | 1 | True | 3 |
7 | ExtraTreesGini | 0.834579 | 0.82 | 0.139867 | 0.057605 | 1.065879 | 0.139867 | 0.057605 | 1.065879 | 1 | True | 8 |
8 | NeuralNetTorch | 0.833555 | 0.83 | 0.058734 | 0.011795 | 1.008168 | 0.058734 | 0.011795 | 1.008168 | 1 | True | 12 |
9 | ExtraTreesEntr | 0.833350 | 0.81 | 0.141707 | 0.057411 | 1.067412 | 0.141707 | 0.057411 | 1.067412 | 1 | True | 9 |
10 | LightGBMLarge | 0.828949 | 0.83 | 0.036487 | 0.005495 | 0.519562 | 0.036487 | 0.005495 | 0.519562 | 1 | True | 13 |
11 | NeuralNetFastAI | 0.818610 | 0.82 | 0.159030 | 0.013877 | 2.596844 | 0.159030 | 0.013877 | 2.596844 | 1 | True | 10 |
12 | KNeighborsUnif | 0.725970 | 0.73 | 0.024979 | 0.007277 | 0.611909 | 0.024979 | 0.007277 | 0.611909 | 1 | True | 1 |
13 | KNeighborsDist | 0.695158 | 0.65 | 0.024856 | 0.006534 | 0.604089 | 0.024856 | 0.006534 | 0.604089 | 1 | True | 2 |
Now you’re ready to try AutoGluon on your own tabular datasets! As long as they’re stored in a popular format like CSV, you should be able to achieve strong predictive performance with just 2 lines of code:
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label=<variable-name>).fit(train_data=<file-name>)
Note: This simple call to fit()
is intended for your first
prototype model. In a subsequent section, we’ll demonstrate how to
maximize predictive performance by additionally specifying the
presets
parameter to fit()
and the eval_metric
parameter to
TabularPredictor()
.
Description of fit():¶
Here we discuss what happened during fit()
.
Since there are only two possible values of the class
variable, this
was a binary classification problem, for which an appropriate
performance metric is accuracy. AutoGluon automatically infers this as
well as the type of each feature (i.e., which columns contain continuous
numbers vs. discrete categories). AutoGluon can also automatically
handle common issues like missing data and rescaling feature values.
We did not specify separate validation data and so AutoGluon automatically choses a random training/validation split of the data. The data used for validation is separated from the training data and is used to determine the models and hyperparameter-values that produce the best results. Rather than just a single model, AutoGluon trains multiple models and ensembles them together to ensure superior predictive performance.
By default, AutoGluon tries to fit various types of models including neural networks and tree ensembles. Each type of model has various hyperparameters, which traditionally, the user would have to specify. AutoGluon automates this process.
AutoGluon automatically and iteratively tests values for hyperparameters
to produce the best performance on the validation data. This involves
repeatedly training models under different hyperparameter settings and
evaluating their performance. This process can be
computationally-intensive, so fit()
can parallelize this process
across multiple threads (and machines if distributed resources are
available). To control runtimes, you can specify various arguments in
fit()
as demonstrated in the subsequent In-Depth tutorial.
For tabular problems, fit()
returns a Predictor
object. For
classification, you can easily output predicted class probabilities
instead of predicted classes:
pred_probs = predictor.predict_proba(test_data_nolab)
pred_probs.head(5)
<=50K | >50K | |
---|---|---|
0 | 0.982107 | 0.017893 |
1 | 0.988337 | 0.011663 |
2 | 0.573505 | 0.426495 |
3 | 0.998272 | 0.001728 |
4 | 0.990299 | 0.009701 |
Besides inference, this object can also summarize what happened during fit.
results = predictor.fit_summary(show_plot=True)
* Summary of fit() * Estimated performance of each model: model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 XGBoost 0.87 0.006450 0.229545 0.006450 0.229545 1 True 11 1 WeightedEnsemble_L2 0.87 0.007092 0.562734 0.000642 0.333189 2 True 14 2 CatBoost 0.85 0.005373 1.416180 0.005373 1.416180 1 True 7 3 LightGBM 0.85 0.005560 0.827884 0.005560 0.827884 1 True 4 4 RandomForestGini 0.84 0.058765 1.083881 0.058765 1.083881 1 True 5 5 LightGBMLarge 0.83 0.005495 0.519562 0.005495 0.519562 1 True 13 6 LightGBMXT 0.83 0.005689 1.240028 0.005689 1.240028 1 True 3 7 NeuralNetTorch 0.83 0.011795 1.008168 0.011795 1.008168 1 True 12 8 RandomForestEntr 0.83 0.057517 1.057551 0.057517 1.057551 1 True 6 9 NeuralNetFastAI 0.82 0.013877 2.596844 0.013877 2.596844 1 True 10 10 ExtraTreesGini 0.82 0.057605 1.065879 0.057605 1.065879 1 True 8 11 ExtraTreesEntr 0.81 0.057411 1.067412 0.057411 1.067412 1 True 9 12 KNeighborsUnif 0.73 0.007277 0.611909 0.007277 0.611909 1 True 1 13 KNeighborsDist 0.65 0.006534 0.604089 0.006534 0.604089 1 True 2 Number of models trained: 14 Types of models trained: {'WeightedEnsembleModel', 'NNFastAiTabularModel', 'LGBModel', 'CatBoostModel', 'RFModel', 'KNNModel', 'TabularNeuralNetTorchModel', 'XTModel', 'XGBoostModel'} Bagging used: False Multi-layer stack-ensembling used: False Feature Metadata (Processed): (raw dtype, special dtypes): ('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('int', ['bool']) : 1 | ['sex'] * End of fit() summary *
/home/ci/autogluon/core/src/autogluon/core/utils/plots.py:138: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
From this summary, we can see that AutoGluon trained many different types of models as well as an ensemble of the best-performing models. The summary also describes the actual models that were trained during fit and how well each model performed on the held-out validation data. We can view what properties AutoGluon automatically inferred about our prediction task:
print("AutoGluon infers problem type is: ", predictor.problem_type)
print("AutoGluon identified the following types of features:")
print(predictor.feature_metadata)
AutoGluon infers problem type is: binary
AutoGluon identified the following types of features:
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
AutoGluon correctly recognized our prediction problem to be a binary
classification task and decided that variables such as age
should
be represented as integers, whereas variables such as workclass
should be represented as categorical objects. The feature_metadata
attribute allows you to see the inferred data type of each predictive
variable after preprocessing (this is its raw dtype; some features may
also be associated with additional special dtypes if produced via
feature-engineering, e.g. numerical representations of a datetime/text
column).
We can evaluate the performance of each individual trained model on our (labeled) test data:
predictor.leaderboard(test_data, silent=True)
model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RandomForestGini | 0.842870 | 0.84 | 0.137201 | 0.058765 | 1.083881 | 0.137201 | 0.058765 | 1.083881 | 1 | True | 5 |
1 | CatBoost | 0.842461 | 0.85 | 0.011980 | 0.005373 | 1.416180 | 0.011980 | 0.005373 | 1.416180 | 1 | True | 7 |
2 | RandomForestEntr | 0.841130 | 0.83 | 0.138423 | 0.057517 | 1.057551 | 0.138423 | 0.057517 | 1.057551 | 1 | True | 6 |
3 | LightGBM | 0.839799 | 0.85 | 0.014275 | 0.005560 | 0.827884 | 0.014275 | 0.005560 | 0.827884 | 1 | True | 4 |
4 | XGBoost | 0.837445 | 0.87 | 0.048748 | 0.006450 | 0.229545 | 0.048748 | 0.006450 | 0.229545 | 1 | True | 11 |
5 | WeightedEnsemble_L2 | 0.837445 | 0.87 | 0.051136 | 0.007092 | 0.562734 | 0.002388 | 0.000642 | 0.333189 | 2 | True | 14 |
6 | LightGBMXT | 0.836421 | 0.83 | 0.010732 | 0.005689 | 1.240028 | 0.010732 | 0.005689 | 1.240028 | 1 | True | 3 |
7 | ExtraTreesGini | 0.834579 | 0.82 | 0.139439 | 0.057605 | 1.065879 | 0.139439 | 0.057605 | 1.065879 | 1 | True | 8 |
8 | NeuralNetTorch | 0.833555 | 0.83 | 0.059543 | 0.011795 | 1.008168 | 0.059543 | 0.011795 | 1.008168 | 1 | True | 12 |
9 | ExtraTreesEntr | 0.833350 | 0.81 | 0.138013 | 0.057411 | 1.067412 | 0.138013 | 0.057411 | 1.067412 | 1 | True | 9 |
10 | LightGBMLarge | 0.828949 | 0.83 | 0.038333 | 0.005495 | 0.519562 | 0.038333 | 0.005495 | 0.519562 | 1 | True | 13 |
11 | NeuralNetFastAI | 0.818610 | 0.82 | 0.162671 | 0.013877 | 2.596844 | 0.162671 | 0.013877 | 2.596844 | 1 | True | 10 |
12 | KNeighborsUnif | 0.725970 | 0.73 | 0.015381 | 0.007277 | 0.611909 | 0.015381 | 0.007277 | 0.611909 | 1 | True | 1 |
13 | KNeighborsDist | 0.695158 | 0.65 | 0.024056 | 0.006534 | 0.604089 | 0.024056 | 0.006534 | 0.604089 | 1 | True | 2 |
When we call predict()
, AutoGluon automatically predicts with the
model that displayed the best performance on validation data (i.e. the
weighted-ensemble). We can instead specify which model to use for
predictions like this:
predictor.predict(test_data, model='LightGBM')
Above the scores of predictive performance were based on a default evaluation metric (accuracy for binary classification). Performance in certain applications may be measured by different metrics than the ones AutoGluon optimizes for by default. If you know the metric that counts in your application, you should specify it as demonstrated in the next section.
Presets¶
AutoGluon comes with a variety of presets that can be specified in the
call to .fit
via the presets
argument. medium_quality
is
used by default to encourage initial prototyping, but for serious usage,
the other presets should be used instead.
P r e s e t |
Model Quality |
Use Cases |
F i t T i m e ( I d e a l ) |
Inferen ce Time (Relati ve to medium_ quality ) |
D i s k U s a g e |
---|---|---|---|---|---|
b e s t _ q u a l i t y |
State-of-t he-art (SOTA), much better than high_quali ty |
When accuracy is what matters |
1 6 x + |
32x+ |
1 6 x + |
h i g h _ q u a l i t y |
Better than good_quali ty |
When a very powerful, portable solution with fast inference is required: Large-scale batch inference |
1 6 x |
4x |
2 x |
g o o d _ q u a l i t y |
Significan tly better than medium_qua lity |
When a powerful, highly portable solution with very fast inference is required: Billion-scale batch inference, sub-100ms online-inference, edge-devices |
1 6 x |
2x |
1 x |
m e d i u m _ q u a l i t y |
Competitiv e with other top AutoML Frameworks |
Initial prototyping, establishing a performance baseline |
1 x |
1x |
1 x |
medium_quality
to get a sense of
the problem and identify any data related issues. If
medium_quality
is taking too long to train, consider subsampling
the training data during this prototyping phase.best_quality
. Make sure to
specify at least 16x the time_limit
value as used in
medium_quality
. Once finished, you should have a very powerful
solution that is often stronger than medium_quality
.best_quality
and medium_quality
, check
if either satisfies your needs. If neither do, consider trying
high_quality
and/or good_quality
.Maximizing predictive performance¶
Note: You should not call fit()
with entirely default arguments
if you are benchmarking AutoGluon-Tabular or hoping to maximize its
accuracy! To get the best predictive accuracy with AutoGluon, you should
generally use it like this:
time_limit = 60 # for quick demonstration only, you should set this to longest time you are willing to wait (in seconds)
metric = 'roc_auc' # specify your evaluation metric here
predictor = TabularPredictor(label, eval_metric=metric).fit(train_data, time_limit=time_limit, presets='best_quality')
predictor.leaderboard(test_data, silent=True)
No path specified. Models will be saved in: "AutogluonModels/ag-20221213_015623/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=5, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "AutogluonModels/ag-20221213_015623/"
AutoGluon Version: 0.6.1b20221213
Python Version: 3.8.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Tue Nov 30 00:17:50 UTC 2021
Train Data Rows: 500
Train Data Columns: 14
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 31214.32 MB
Train Data (Original) Memory Usage: 0.29 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
To change this, specify the eval_metric parameter of Predictor()
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 59.9s of the 59.9s of remaining time.
0.5196 = Validation score (roc_auc)
0.0s = Training runtime
0.01s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 59.88s of the 59.88s of remaining time.
0.537 = Validation score (roc_auc)
0.0s = Training runtime
0.0s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 59.86s of the 59.86s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.8819 = Validation score (roc_auc)
1.27s = Training runtime
0.03s = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 53.21s of the 53.21s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.867 = Validation score (roc_auc)
1.29s = Training runtime
0.03s = Validation runtime
Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 49.55s of the 49.55s of remaining time.
0.8879 = Validation score (roc_auc)
0.47s = Training runtime
0.11s = Validation runtime
Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 48.94s of the 48.94s of remaining time.
0.8899 = Validation score (roc_auc)
0.57s = Training runtime
0.11s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 48.24s of the 48.24s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.8923 = Validation score (roc_auc)
3.48s = Training runtime
0.03s = Validation runtime
Fitting model: ExtraTreesGini_BAG_L1 ... Training model for up to 42.56s of the 42.55s of remaining time.
0.8958 = Validation score (roc_auc)
0.47s = Training runtime
0.11s = Validation runtime
Fitting model: ExtraTreesEntr_BAG_L1 ... Training model for up to 41.95s of the 41.95s of remaining time.
0.8904 = Validation score (roc_auc)
0.5s = Training runtime
0.11s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 41.31s of the 41.31s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.8683 = Validation score (roc_auc)
2.49s = Training runtime
0.07s = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 36.68s of the 36.68s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.868 = Validation score (roc_auc)
0.63s = Training runtime
0.03s = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 33.44s of the 33.44s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.8394 = Validation score (roc_auc)
3.59s = Training runtime
0.07s = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 27.5s of the 27.5s of remaining time.
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
0.8433 = Validation score (roc_auc)
1.41s = Training runtime
0.03s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.9s of the 23.68s of remaining time.
0.9038 = Validation score (roc_auc)
0.39s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 36.72s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221213_015623/")
model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBMXT_BAG_L1 | 0.900802 | 0.881867 | 0.352405 | 0.027543 | 1.265067 | 0.352405 | 0.027543 | 1.265067 | 1 | True | 3 |
1 | CatBoost_BAG_L1 | 0.900744 | 0.892278 | 0.050592 | 0.032186 | 3.479597 | 0.050592 | 0.032186 | 3.479597 | 1 | True | 7 |
2 | WeightedEnsemble_L2 | 0.897555 | 0.903825 | 0.949188 | 0.218097 | 6.826740 | 0.003550 | 0.000491 | 0.387969 | 2 | True | 14 |
3 | LightGBM_BAG_L1 | 0.892347 | 0.866991 | 0.302695 | 0.025454 | 1.287961 | 0.302695 | 0.025454 | 1.287961 | 1 | True | 4 |
4 | XGBoost_BAG_L1 | 0.891483 | 0.868006 | 0.328814 | 0.032381 | 0.628914 | 0.328814 | 0.032381 | 0.628914 | 1 | True | 11 |
5 | NeuralNetTorch_BAG_L1 | 0.887304 | 0.839371 | 0.315951 | 0.072376 | 3.590478 | 0.315951 | 0.072376 | 3.590478 | 1 | True | 12 |
6 | RandomForestEntr_BAG_L1 | 0.886981 | 0.889863 | 0.144211 | 0.108986 | 0.569606 | 0.144211 | 0.108986 | 0.569606 | 1 | True | 6 |
7 | RandomForestGini_BAG_L1 | 0.885163 | 0.887874 | 0.147684 | 0.114884 | 0.472775 | 0.147684 | 0.114884 | 0.472775 | 1 | True | 5 |
8 | NeuralNetFastAI_BAG_L1 | 0.885055 | 0.868290 | 0.750721 | 0.072119 | 2.489320 | 0.750721 | 0.072119 | 2.489320 | 1 | True | 10 |
9 | ExtraTreesEntr_BAG_L1 | 0.880342 | 0.890401 | 0.142248 | 0.109375 | 0.502267 | 0.142248 | 0.109375 | 0.502267 | 1 | True | 9 |
10 | ExtraTreesGini_BAG_L1 | 0.879143 | 0.895789 | 0.144326 | 0.113302 | 0.469855 | 0.144326 | 0.113302 | 0.469855 | 1 | True | 8 |
11 | LightGBMLarge_BAG_L1 | 0.873437 | 0.843308 | 0.102181 | 0.025984 | 1.405076 | 0.102181 | 0.025984 | 1.405076 | 1 | True | 13 |
12 | KNeighborsDist_BAG_L1 | 0.525998 | 0.536956 | 0.028825 | 0.004987 | 0.003174 | 0.028825 | 0.004987 | 0.003174 | 1 | True | 2 |
13 | KNeighborsUnif_BAG_L1 | 0.514970 | 0.519604 | 0.025368 | 0.006683 | 0.003319 | 0.025368 | 0.006683 | 0.003319 | 1 | True | 1 |
This command implements the following strategy to maximize accuracy:
Specify the argument
presets='best_quality'
, which allows AutoGluon to automatically construct powerful model ensembles based on stacking/bagging, and will greatly improve the resulting predictions if granted sufficient training time. The default value ofpresets
is'medium_quality'
, which produces less accurate models but facilitates faster prototyping. Withpresets
, you can flexibly prioritize predictive accuracy vs. training/inference speed. For example, if you care less about predictive performance and want to quickly deploy a basic model, consider using:presets=['good_quality', 'optimize_for_deployment']
.Provide the parameter
eval_metric
toTabularPredictor()
if you know what metric will be used to evaluate predictions in your application. Some other non-default metrics you might use include things like:'f1'
(for binary classification),'roc_auc'
(for binary classification),'log_loss'
(for classification),'mean_absolute_error'
(for regression),'median_absolute_error'
(for regression). You can also define your own custom metric function. For more information refer to Adding a custom metric to AutoGluonInclude all your data in
train_data
and do not providetuning_data
(AutoGluon will split the data more intelligently to fit its needs).Do not specify the
hyperparameter_tune_kwargs
argument (counterintuitively, hyperparameter tuning is not the best way to spend a limited training time budgets, as model ensembling is often superior). We recommend you only usehyperparameter_tune_kwargs
if your goal is to deploy a single model rather than an ensemble.Do not specify
hyperparameters
argument (allow AutoGluon to adaptively select which models/hyperparameters to use).Set
time_limit
to the longest amount of time (in seconds) that you are willing to wait. AutoGluon’s predictive performance improves the longerfit()
is allowed to run.
Regression (predicting numeric table columns):¶
To demonstrate that fit()
can also automatically handle regression
tasks, we now try to predict the numeric age
variable in the same
table based on the other features:
age_column = 'age'
print("Summary of age variable: \n", train_data[age_column].describe())
Summary of age variable:
count 500.00000
mean 39.65200
std 13.52393
min 17.00000
25% 29.00000
50% 38.00000
75% 49.00000
max 85.00000
Name: age, dtype: float64
We again call fit()
, imposing a time-limit this time (in seconds),
and also demonstrate a shorthand method to evaluate the resulting model
on the test data (which contain labels):
predictor_age = TabularPredictor(label=age_column, path="agModels-predictAge").fit(train_data, time_limit=60)
performance = predictor_age.evaluate(test_data)
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "agModels-predictAge/"
AutoGluon Version: 0.6.1b20221213
Python Version: 3.8.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Tue Nov 30 00:17:50 UTC 2021
Train Data Rows: 500
Train Data Columns: 14
Label Column: age
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (85, 17, 39.652, 13.52393)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 30901.02 MB
Train Data (Original) Memory Usage: 0.32 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
('object', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 5 | ['fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
('int', ['bool']) : 2 | ['sex', 'class']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif ... Training model for up to 59.9s of the 59.9s of remaining time.
-15.6869 = Validation score (-root_mean_squared_error)
0.01s = Training runtime
0.01s = Validation runtime
Fitting model: KNeighborsDist ... Training model for up to 59.89s of the 59.89s of remaining time.
-15.1801 = Validation score (-root_mean_squared_error)
0.01s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBMXT ... Training model for up to 59.87s of the 59.87s of remaining time.
-11.7092 = Validation score (-root_mean_squared_error)
0.32s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBM ... Training model for up to 59.54s of the 59.54s of remaining time.
-11.9295 = Validation score (-root_mean_squared_error)
0.27s = Training runtime
0.01s = Validation runtime
Fitting model: RandomForestMSE ... Training model for up to 59.26s of the 59.25s of remaining time.
-11.6624 = Validation score (-root_mean_squared_error)
0.38s = Training runtime
0.05s = Validation runtime
Fitting model: CatBoost ... Training model for up to 58.82s of the 58.81s of remaining time.
-11.7993 = Validation score (-root_mean_squared_error)
0.62s = Training runtime
0.01s = Validation runtime
Fitting model: ExtraTreesMSE ... Training model for up to 58.19s of the 58.19s of remaining time.
-11.3627 = Validation score (-root_mean_squared_error)
0.37s = Training runtime
0.05s = Validation runtime
Fitting model: NeuralNetFastAI ... Training model for up to 57.75s of the 57.75s of remaining time.
-12.0733 = Validation score (-root_mean_squared_error)
0.62s = Training runtime
0.01s = Validation runtime
Fitting model: XGBoost ... Training model for up to 57.1s of the 57.1s of remaining time.
-12.2892 = Validation score (-root_mean_squared_error)
0.26s = Training runtime
0.01s = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 56.83s of the 56.83s of remaining time.
-11.9345 = Validation score (-root_mean_squared_error)
1.96s = Training runtime
0.01s = Validation runtime
Fitting model: LightGBMLarge ... Training model for up to 54.85s of the 54.85s of remaining time.
-12.3153 = Validation score (-root_mean_squared_error)
0.53s = Training runtime
0.01s = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.9s of the 54.3s of remaining time.
-11.2248 = Validation score (-root_mean_squared_error)
0.31s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 6.03s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictAge/")
Evaluation: root_mean_squared_error on test data: -10.486811058771206
Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
"root_mean_squared_error": -10.486811058771206,
"mean_squared_error": -109.97320618236607,
"mean_absolute_error": -8.210400141765692,
"r2": 0.4121718649826517,
"pearsonr": 0.6429678448059126,
"median_absolute_error": -6.843406677246094
}
Note that we didn’t need to tell AutoGluon this is a regression problem,
it automatically inferred this from the data and reported the
appropriate performance metric (RMSE by default). To specify a
particular evaluation metric other than the default, set the
eval_metric
parameter of TabularPredictor()
and AutoGluon will
tailor its models to optimize your metric
(e.g. eval_metric = 'mean_absolute_error'
). For evaluation metrics
where higher values are worse (like RMSE), AutoGluon will flip their
sign and print them as negative values during training (as it internally
assumes higher values are better).
We can call leaderboard to see the per-model performance:
predictor_age.leaderboard(test_data, silent=True)
model | score_test | score_val | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | WeightedEnsemble_L2 | -10.486811 | -11.224790 | 0.441689 | 0.094346 | 3.843604 | 0.005272 | 0.000444 | 0.308144 | 2 | True | 12 |
1 | ExtraTreesMSE | -10.655482 | -11.362738 | 0.107336 | 0.049393 | 0.374728 | 0.107336 | 0.049393 | 0.374728 | 1 | True | 7 |
2 | RandomForestMSE | -10.746175 | -11.662354 | 0.108093 | 0.047231 | 0.376682 | 0.108093 | 0.047231 | 0.376682 | 1 | True | 5 |
3 | CatBoost | -10.780312 | -11.799279 | 0.012905 | 0.005237 | 0.618630 | 0.012905 | 0.005237 | 0.618630 | 1 | True | 6 |
4 | LightGBMXT | -10.837373 | -11.709228 | 0.049953 | 0.005837 | 0.318484 | 0.049953 | 0.005837 | 0.318484 | 1 | True | 3 |
5 | LightGBM | -10.972156 | -11.929546 | 0.017067 | 0.005000 | 0.269586 | 0.017067 | 0.005000 | 0.269586 | 1 | True | 4 |
6 | XGBoost | -11.115033 | -12.289224 | 0.045289 | 0.006274 | 0.256583 | 0.045289 | 0.006274 | 0.256583 | 1 | True | 9 |
7 | NeuralNetTorch | -11.120471 | -11.934453 | 0.054829 | 0.012011 | 1.962501 | 0.054829 | 0.012011 | 1.962501 | 1 | True | 10 |
8 | NeuralNetFastAI | -11.225699 | -12.073282 | 0.155994 | 0.013553 | 0.617402 | 0.155994 | 0.013553 | 0.617402 | 1 | True | 8 |
9 | LightGBMLarge | -11.469922 | -12.315314 | 0.041820 | 0.005629 | 0.527549 | 0.041820 | 0.005629 | 0.527549 | 1 | True | 11 |
10 | KNeighborsUnif | -14.902058 | -15.686937 | 0.018496 | 0.006272 | 0.005789 | 0.018496 | 0.006272 | 0.005789 | 1 | True | 1 |
11 | KNeighborsDist | -15.771259 | -15.180149 | 0.023017 | 0.006834 | 0.005761 | 0.023017 | 0.006834 | 0.005761 | 1 | True | 2 |
Data Formats: AutoGluon can currently operate on data tables already loaded into Python as pandas DataFrames, or those stored in files of CSV format or Parquet format. If your data live in multiple tables, you will first need to join them into a single table whose rows correspond to statistically independent observations (datapoints) and columns correspond to different features (aka. variables/covariates).
Refer to the TabularPredictor documentation to see all of the available methods/options.
Advanced Usage¶
For more advanced usage examples of AutoGluon, refer to Predicting Columns in a Table - In Depth
If you are interested in deployment optimization, refer to the Predicting Columns in a Table - Deployment Optimization tutorial.