Hyperparameter Optimization with AutoGluon¶

Tip: If you are new to AutoGluon, review Predicting Columns in a Table - Quick Start to learn the basics of the AutoGluon API.

This tutorial describes how you can perform hyperparameter optimization (HPO) with AutoGluon-Tabular.

Using the same census data table as in the Predicting Columns in a Table - Quick Start tutorial, we’ll predict the occupation of an individual - a multiclass classification problem. Start by importing AutoGluon’s TabularPredictor and TabularDataset, and loading the data.

from autogluon.tabular import TabularDataset, TabularPredictor

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
subsample_size = 1000  # subsample data for a faster demo
train_data = train_data.sample(n=subsample_size, random_state=0)

label = 'occupation'
metric = 'accuracy'

Specifying hyperparameters and tuning them¶

Note: We don’t recommend doing hyperparameter-tuning with AutoGluon in most cases. AutoGluon achieves its best performance without hyperparameter tuning and simply specifying one of the available presets, such as presets="best_quality".

We demonstrate hyperparameter-tuning and how you can provide your own validation dataset that AutoGluon internally relies on to: tune hyperparameters, early-stop iterative training, and construct model ensembles. One reason you may specify validation data is when future test data will stem from a different distribution than training data (and your specified validation data is more representative of the future data that will likely be encountered).

If you don’t have a strong reason to provide your own validation dataset, we recommend you omit the tuning_data argument. This lets AutoGluon automatically select validation data from your provided training set (it uses smart strategies such as stratified sampling). For greater control, you can specify the holdout_frac argument to tell AutoGluon what fraction of the provided training data to hold out for validation.

Caution: Since AutoGluon tunes internal knobs based on this validation data, performance estimates reported on this data may be over-optimistic. For unbiased performance estimates, you should always call predict() on a separate dataset (that was never passed to fit()), as we did in the previous Quick-Start tutorial. We also emphasize that most options specified in this tutorial are chosen to minimize runtime for the purposes of demonstration and you should select more reasonable values in order to obtain high-quality models.

fit() trains neural networks and various types of tree ensembles by default. You can specify various hyperparameter values for each type of model. For each hyperparameter, you can either specify a single fixed value, or a search space of values to consider during hyperparameter optimization. Hyperparameters which you do not specify are left at default settings chosen automatically by AutoGluon, which may be fixed values or search spaces.

Refer to the Search Space documentation to learn more about AutoGluon search space.

from autogluon.common import space

nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs (controls training time of NN models)
    'learning_rate': space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'dropout_prob': space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': space.Int(lower=26, upper=66, default=36),  # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {  # hyperparameters of each model type
    'GBM': gbm_options,
    'NN_TORCH': nn_options,  # NOTE: comment this line out if you get errors on Mac OSX
}  # When these keys are missing from hyperparameters dict, no models of that type are trained

time_limit = 2*60  # train various models for ~2 min
num_trials = 5  # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler

hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': num_trials,
    'scheduler' : 'local',
    'searcher': search_strategy,
}  # Refer to TabularPredictor.fit docstring for all valid values

predictor = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data,
    time_limit=time_limit,
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)

Fitted model: NeuralNetTorch/d8f2b2e5 ...
	0.36	 = Validation score   (accuracy)
	3.34s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/4023a8c3 ...
	0.37	 = Validation score   (accuracy)
	4.11s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/b74c1f69 ...
	0.295	 = Validation score   (accuracy)
	3.64s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/4f53d2ae ...
	0.335	 = Validation score   (accuracy)
	4.6s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/f8c6b273 ...
	0.315	 = Validation score   (accuracy)
	3.58s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.91s of the 95.63s of remaining time.
	Fitting 1 model on all data | Fitting with cpus=8, gpus=0, mem=0.0/26.9 GB
	Ensemble Weights: {'LightGBM/T3': 1.0}
	0.375	 = Validation score   (accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 24.42s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 42633.7 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20251219_224723")

Use the trained models to predict on the test data:

predictor.predict_proba(test_data)

	?	Adm-clerical	Armed-Forces	Craft-repair	Exec-managerial	Farming-fishing	Handlers-cleaners	Machine-op-inspct	Other-service	Priv-house-serv	Prof-specialty	Protective-serv	Sales	Tech-support	Transport-moving
0	0.025455	0.143085	0.0	0.058582	0.052842	0.013711	0.160344	0.077188	0.272931	0.0	0.040006	0.005994	0.098108	0.014809	0.036947
1	0.037403	0.080897	0.0	0.174850	0.082207	0.092676	0.085607	0.060385	0.094478	0.0	0.044100	0.072914	0.090826	0.012117	0.071541
2	0.012938	0.041318	0.0	0.127704	0.230690	0.018618	0.010542	0.022856	0.031082	0.0	0.104464	0.012570	0.205550	0.036649	0.145019
3	0.018036	0.126389	0.0	0.053754	0.092753	0.010294	0.055407	0.043998	0.113893	0.0	0.049174	0.005450	0.360493	0.054208	0.016152
4	0.024717	0.047068	0.0	0.107373	0.033515	0.038920	0.112611	0.035569	0.302404	0.0	0.027227	0.044776	0.175728	0.009102	0.040991
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9764	0.022695	0.173963	0.0	0.063248	0.094021	0.011671	0.045861	0.035712	0.175289	0.0	0.075666	0.006597	0.161814	0.071831	0.061631
9765	0.014564	0.101126	0.0	0.266706	0.038117	0.047429	0.028536	0.195958	0.140708	0.0	0.028370	0.004199	0.062154	0.029586	0.042546
9766	0.014503	0.045200	0.0	0.235822	0.106134	0.105915	0.011736	0.047927	0.031357	0.0	0.035883	0.009569	0.224209	0.009204	0.122540
9767	0.028806	0.040528	0.0	0.129259	0.062479	0.027159	0.118058	0.051084	0.087650	0.0	0.041943	0.018851	0.286050	0.057914	0.050220
9768	0.025488	0.160506	0.0	0.361373	0.072063	0.069930	0.078646	0.027173	0.045844	0.0	0.041006	0.018729	0.035659	0.015714	0.047869

9769 rows × 15 columns

Use leaderboard to see how each model performs on the test data:

predictor.leaderboard(test_data)

	model	score_test	score_val	eval_metric	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	NeuralNetTorch/4023a8c3	0.343945	0.370	accuracy	0.075747	0.012946	4.111812	0.075747	0.012946	4.111812	1	True	7
1	LightGBM/T3	0.339748	0.375	accuracy	0.058309	0.003833	0.416410	0.058309	0.003833	0.416410	1	True	3
2	WeightedEnsemble_L2	0.339748	0.375	accuracy	0.061221	0.004691	0.446145	0.002912	0.000858	0.029734	2	True	11
3	LightGBM/T1	0.336165	0.370	accuracy	0.023038	0.003309	0.893035	0.023038	0.003309	0.893035	1	True	1
4	LightGBM/T5	0.334425	0.375	accuracy	0.122116	0.005018	0.635938	0.122116	0.005018	0.635938	1	True	5
5	LightGBM/T4	0.330842	0.360	accuracy	0.284777	0.007922	0.705397	0.284777	0.007922	0.705397	1	True	4
6	NeuralNetTorch/4f53d2ae	0.326236	0.335	accuracy	0.086404	0.013762	4.604245	0.086404	0.013762	4.604245	1	True	9
7	NeuralNetTorch/f8c6b273	0.322141	0.315	accuracy	0.062810	0.010838	3.583930	0.062810	0.010838	3.583930	1	True	10
8	NeuralNetTorch/d8f2b2e5	0.320504	0.360	accuracy	0.046016	0.010383	3.336023	0.046016	0.010383	3.336023	1	True	6
9	LightGBM/T2	0.311393	0.355	accuracy	0.096478	0.004433	0.778324	0.096478	0.004433	0.778324	1	True	2
10	NeuralNetTorch/b74c1f69	0.309141	0.295	accuracy	0.059584	0.011150	3.641088	0.059584	0.011150	3.641088	1	True	8

In the above example, the predictive performance may be poor because we specified very little training to ensure quick runtimes. You can call fit() multiple times while modifying the above settings to better understand how these choices affect performance outcomes. For example: you can increase subsample_size to train using a larger dataset, increase the num_epochs and num_boost_round hyperparameters, and increase the time_limit (which you should do for all code in these tutorials). To see more detailed output during the execution of fit(), you can also pass in the argument: verbosity=3.