Getting started with Advanced HPO Algorithms¶
This tutorial provides a complete example of how to use AutoGluon’s state-of-the-art hyperparameter optimization (HPO) algorithms to tune a basic Multi-Layer Perceptron (MLP) model, which is the most basic type of neural network.
Loading libraries¶
# Basic utils for folder manipulations etc
import time
import multiprocessing # to count the number of CPUs available
# External tools to load and process data
import numpy as np
import pandas as pd
# MXNet (NeuralNets)
import mxnet as mx
from mxnet import gluon, autograd
from mxnet.gluon import nn
# AutoGluon and HPO tools
import autogluon.core as ag
from autogluon.mxnet.utils import load_and_split_openml_data
Check the version of MxNet, you should be fine with version >= 1.5
mx.__version__
'1.7.0'
You can also check the version of AutoGluon and the specific commit and check that it matches what you want.
import autogluon.core.version
ag.version.__version__
'0.1.1b20210310'
Hyperparameter Optimization of a 2-layer MLP¶
Setting up the context¶
Here we declare a few “environment variables” setting the context for what we’re doing
OPENML_TASK_ID = 6 # describes the problem we will tackle
RATIO_TRAIN_VALID = 0.33 # split of the training data used for validation
RESOURCE_ATTR_NAME = 'epoch' # how do we measure resources (will become clearer further)
REWARD_ATTR_NAME = 'objective' # how do we measure performance (will become clearer further)
NUM_CPUS = multiprocessing.cpu_count()
Preparing the data¶
We will use a multi-way classification task from OpenML. Data preparation includes:
Missing values are imputed, using the ‘mean’ strategy of
sklearn.impute.SimpleImputer
Split training set into training and validation
Standardize inputs to mean 0, variance 1
X_train, X_valid, y_train, y_valid, n_classes = load_and_split_openml_data(
OPENML_TASK_ID, RATIO_TRAIN_VALID, download_from_openml=False)
n_classes
100%|██████████| 704/704 [00:00<00:00, 50104.19KB/s]
100%|██████████| 2521/2521 [00:00<00:00, 16474.31KB/s]
3KB [00:00, 4013.69KB/s]
8KB [00:00, 9977.53KB/s]
15KB [00:00, 12057.22KB/s]
2998KB [00:00, 47065.80KB/s]
881KB [00:00, 49928.82KB/s]
3KB [00:00, 4105.35KB/s]
26
The problem has 26 classes.
Declaring a model specifying a hyperparameter space with AutoGluon¶
Two layer MLP where we optimize over:
the number of units on the first layer
the number of units on the second layer
the dropout rate after each layer
the learning rate
the scaling
the
@ag.args
decorator allows us to specify the space we will optimize over, this matches the ConfigSpace syntax
The body of the function run_mlp_openml
is pretty simple:
it reads the hyperparameters given via the decorator
it defines a 2 layer MLP with dropout
it declares a trainer with the ‘adam’ loss function and a provided learning rate
it trains the NN with a number of epochs (most of that is boilerplate code from
mxnet
)the
reporter
at the end is used to keep track of training history in the hyperparameter optimization
Note: The number of epochs and the hyperparameter space are reduced to make for a shorter experiment
@ag.args(n_units_1=ag.space.Int(lower=16, upper=128),
n_units_2=ag.space.Int(lower=16, upper=128),
dropout_1=ag.space.Real(lower=0, upper=.75),
dropout_2=ag.space.Real(lower=0, upper=.75),
learning_rate=ag.space.Real(lower=1e-6, upper=1, log=True),
batch_size=ag.space.Int(lower=8, upper=128),
scale_1=ag.space.Real(lower=0.001, upper=10, log=True),
scale_2=ag.space.Real(lower=0.001, upper=10, log=True),
epochs=9)
def run_mlp_openml(args, reporter, **kwargs):
# Time stamp for elapsed_time
ts_start = time.time()
# Unwrap hyperparameters
n_units_1 = args.n_units_1
n_units_2 = args.n_units_2
dropout_1 = args.dropout_1
dropout_2 = args.dropout_2
scale_1 = args.scale_1
scale_2 = args.scale_2
batch_size = args.batch_size
learning_rate = args.learning_rate
ctx = mx.cpu()
net = nn.Sequential()
with net.name_scope():
# Layer 1
net.add(nn.Dense(n_units_1, activation='relu',
weight_initializer=mx.initializer.Uniform(scale=scale_1)))
# Dropout
net.add(gluon.nn.Dropout(dropout_1))
# Layer 2
net.add(nn.Dense(n_units_2, activation='relu',
weight_initializer=mx.initializer.Uniform(scale=scale_2)))
# Dropout
net.add(gluon.nn.Dropout(dropout_2))
# Output
net.add(nn.Dense(n_classes))
net.initialize(ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'adam',
{'learning_rate': learning_rate})
for epoch in range(args.epochs):
ts_epoch = time.time()
train_iter = mx.io.NDArrayIter(
data={'data': X_train},
label={'label': y_train},
batch_size=batch_size,
shuffle=True)
valid_iter = mx.io.NDArrayIter(
data={'data': X_valid},
label={'label': y_valid},
batch_size=batch_size,
shuffle=False)
metric = mx.metric.Accuracy()
loss = gluon.loss.SoftmaxCrossEntropyLoss()
for batch in train_iter:
data = batch.data[0].as_in_context(ctx)
label = batch.label[0].as_in_context(ctx)
with autograd.record():
output = net(data)
L = loss(output, label)
L.backward()
trainer.step(data.shape[0])
metric.update([label], [output])
name, train_acc = metric.get()
metric = mx.metric.Accuracy()
for batch in valid_iter:
data = batch.data[0].as_in_context(ctx)
label = batch.label[0].as_in_context(ctx)
output = net(data)
metric.update([label], [output])
name, val_acc = metric.get()
print('Epoch %d ; Time: %f ; Training: %s=%f ; Validation: %s=%f' % (
epoch + 1, time.time() - ts_start, name, train_acc, name, val_acc))
ts_now = time.time()
eval_time = ts_now - ts_epoch
elapsed_time = ts_now - ts_start
# The resource reported back (as 'epoch') is the number of epochs
# done, starting at 1
reporter(
epoch=epoch + 1,
objective=float(val_acc),
eval_time=eval_time,
time_step=ts_now,
elapsed_time=elapsed_time)
Note: The annotation epochs=9
specifies the maximum number of
epochs for training. It becomes available as args.epochs
.
Importantly, it is also processed by HyperbandScheduler
below in
order to set its max_t
attribute.
Recommendation: Whenever writing training code to be passed as
train_fn
to a scheduler, if this training code reports a resource
(or time) attribute, the corresponding maximum resource value should be
included in train_fn.args
:
If the resource attribute (
time_attr
of scheduler) intrain_fn
isepoch
, make sure to includeepochs=XYZ
in the annotation. This allows the scheduler to readmax_t
fromtrain_fn.args.epochs
. This case corresponds to our example here.If the resource attribute is something else than
epoch
, you can also include the annotationmax_t=XYZ
, which allows the scheduler to readmax_t
fromtrain_fn.args.max_t
.
Annotating the training function by the correct value for max_t
simplifies scheduler creation (since max_t
does not have to be
passed), and avoids inconsistencies between train_fn
and the
scheduler.
Running the Hyperparameter Optimization¶
You can use the following schedulers:
FIFO (
fifo
)Hyperband (either the stopping (
hbs
) or promotion (hbp
) variant)
And the following searchers:
Random search (
random
)Gaussian process based Bayesian optimization (
bayesopt
)SkOpt Bayesian optimization (
skopt
; only with FIFO scheduler)
Note that the method known as (asynchronous) Hyperband is using random
search. Combining Hyperband scheduling with the bayesopt
searcher
uses a novel method called asynchronous BOHB.
Pick the combination you’re interested in (doing the full experiment
takes around 120 seconds, see the time_out
parameter), running
everything with multiple runs can take a fair bit of time. In real life,
you will want to choose a larger time_out
in order to obtain good
performance.
SCHEDULER = "hbs"
SEARCHER = "bayesopt"
def compute_error(df):
return 1.0 - df["objective"]
def compute_runtime(df, start_timestamp):
return df["time_step"] - start_timestamp
def process_training_history(task_dicts, start_timestamp,
runtime_fn=compute_runtime,
error_fn=compute_error):
task_dfs = []
for task_id in task_dicts:
task_df = pd.DataFrame(task_dicts[task_id])
task_df = task_df.assign(task_id=task_id,
runtime=runtime_fn(task_df, start_timestamp),
error=error_fn(task_df),
target_epoch=task_df["epoch"].iloc[-1])
task_dfs.append(task_df)
result = pd.concat(task_dfs, axis="index", ignore_index=True, sort=True)
# re-order by runtime
result = result.sort_values(by="runtime")
# calculate incumbent best -- the cumulative minimum of the error.
result = result.assign(best=result["error"].cummin())
return result
resources = dict(num_cpus=NUM_CPUS, num_gpus=0)
search_options = {
'num_init_random': 2,
'debug_log': True}
if SCHEDULER == 'fifo':
myscheduler = ag.scheduler.FIFOScheduler(
run_mlp_openml,
resource=resources,
searcher=SEARCHER,
search_options=search_options,
time_out=120,
time_attr=RESOURCE_ATTR_NAME,
reward_attr=REWARD_ATTR_NAME)
else:
# This setup uses rung levels at 1, 3, 9 epochs. We just use a single
# bracket, so this is in fact successive halving (Hyperband would use
# more than 1 bracket).
# Also note that since we do not use the max_t argument of
# HyperbandScheduler, this value is obtained from train_fn.args.epochs.
sch_type = 'stopping' if SCHEDULER == 'hbs' else 'promotion'
myscheduler = ag.scheduler.HyperbandScheduler(
run_mlp_openml,
resource=resources,
searcher=SEARCHER,
search_options=search_options,
time_out=120,
time_attr=RESOURCE_ATTR_NAME,
reward_attr=REWARD_ATTR_NAME,
type=sch_type,
grace_period=1,
reduction_factor=3,
brackets=1)
# run tasks
myscheduler.run()
myscheduler.join_jobs()
results_df = process_training_history(
myscheduler.training_history.copy(),
start_timestamp=myscheduler._start_time)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-course-v3/venv/lib/python3.7/site-packages/distributed/worker.py:3460: UserWarning: Large object of size 1.30 MB detected in task graph:
(0, <function run_mlp_openml at 0x7f130c0b38c0>, { ... sReporter}, [])
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers
future = client.submit(func, big_data) # bad
big_future = client.scatter(big_data) # good
future = client.submit(func, big_future) # good
% (format_bytes(len(b)), s)
Epoch 1 ; Time: 0.510176 ; Training: accuracy=0.260079 ; Validation: accuracy=0.531250
Epoch 2 ; Time: 0.967190 ; Training: accuracy=0.496365 ; Validation: accuracy=0.655247
Epoch 3 ; Time: 1.413584 ; Training: accuracy=0.559650 ; Validation: accuracy=0.694686
Epoch 4 ; Time: 1.974436 ; Training: accuracy=0.588896 ; Validation: accuracy=0.711063
Epoch 5 ; Time: 2.420311 ; Training: accuracy=0.609385 ; Validation: accuracy=0.726939
Epoch 6 ; Time: 2.894735 ; Training: accuracy=0.628139 ; Validation: accuracy=0.745321
Epoch 7 ; Time: 3.331643 ; Training: accuracy=0.641193 ; Validation: accuracy=0.750501
Epoch 8 ; Time: 3.781392 ; Training: accuracy=0.653751 ; Validation: accuracy=0.763202
Epoch 9 ; Time: 4.216795 ; Training: accuracy=0.665482 ; Validation: accuracy=0.766043
Epoch 1 ; Time: 2.021301 ; Training: accuracy=0.099503 ; Validation: accuracy=0.226723
Epoch 1 ; Time: 0.612786 ; Training: accuracy=0.038856 ; Validation: accuracy=0.043108
Epoch 1 ; Time: 0.314291 ; Training: accuracy=0.145411 ; Validation: accuracy=0.173445
Epoch 1 ; Time: 0.425618 ; Training: accuracy=0.437748 ; Validation: accuracy=0.658618
Epoch 2 ; Time: 0.789888 ; Training: accuracy=0.581928 ; Validation: accuracy=0.726190
Epoch 3 ; Time: 1.152802 ; Training: accuracy=0.623843 ; Validation: accuracy=0.765258
Epoch 4 ; Time: 1.510821 ; Training: accuracy=0.646577 ; Validation: accuracy=0.789738
Epoch 5 ; Time: 1.866613 ; Training: accuracy=0.656002 ; Validation: accuracy=0.789571
Epoch 6 ; Time: 2.229037 ; Training: accuracy=0.665261 ; Validation: accuracy=0.804997
Epoch 7 ; Time: 2.586695 ; Training: accuracy=0.673611 ; Validation: accuracy=0.800469
Epoch 8 ; Time: 2.951951 ; Training: accuracy=0.677001 ; Validation: accuracy=0.812374
Epoch 9 ; Time: 3.311681 ; Training: accuracy=0.683118 ; Validation: accuracy=0.807344
Epoch 1 ; Time: 2.122236 ; Training: accuracy=0.037645 ; Validation: accuracy=0.035017
Epoch 1 ; Time: 0.420255 ; Training: accuracy=0.280658 ; Validation: accuracy=0.540730
Epoch 2 ; Time: 0.780650 ; Training: accuracy=0.384189 ; Validation: accuracy=0.611694
Epoch 3 ; Time: 1.135138 ; Training: accuracy=0.401472 ; Validation: accuracy=0.612360
Epoch 1 ; Time: 0.460315 ; Training: accuracy=0.214741 ; Validation: accuracy=0.510490
Epoch 1 ; Time: 0.686158 ; Training: accuracy=0.276290 ; Validation: accuracy=0.390625
Epoch 1 ; Time: 0.433396 ; Training: accuracy=0.153948 ; Validation: accuracy=0.213213
Epoch 1 ; Time: 0.458453 ; Training: accuracy=0.307100 ; Validation: accuracy=0.506995
Epoch 1 ; Time: 0.489886 ; Training: accuracy=0.337184 ; Validation: accuracy=0.621951
Epoch 2 ; Time: 0.907708 ; Training: accuracy=0.471035 ; Validation: accuracy=0.662880
Epoch 3 ; Time: 1.323975 ; Training: accuracy=0.510233 ; Validation: accuracy=0.711828
Epoch 1 ; Time: 0.329290 ; Training: accuracy=0.307217 ; Validation: accuracy=0.567395
Epoch 2 ; Time: 0.603586 ; Training: accuracy=0.481793 ; Validation: accuracy=0.666555
Epoch 3 ; Time: 0.861894 ; Training: accuracy=0.541275 ; Validation: accuracy=0.708235
Epoch 1 ; Time: 0.319615 ; Training: accuracy=0.504373 ; Validation: accuracy=0.681833
Epoch 2 ; Time: 0.589601 ; Training: accuracy=0.640924 ; Validation: accuracy=0.729333
Epoch 3 ; Time: 0.849092 ; Training: accuracy=0.663201 ; Validation: accuracy=0.747667
Epoch 4 ; Time: 1.102395 ; Training: accuracy=0.671205 ; Validation: accuracy=0.748833
Epoch 5 ; Time: 1.357389 ; Training: accuracy=0.665182 ; Validation: accuracy=0.758333
Epoch 6 ; Time: 1.623203 ; Training: accuracy=0.678960 ; Validation: accuracy=0.731333
Epoch 7 ; Time: 1.887955 ; Training: accuracy=0.688119 ; Validation: accuracy=0.750833
Epoch 8 ; Time: 2.177532 ; Training: accuracy=0.699505 ; Validation: accuracy=0.771833
Epoch 9 ; Time: 2.442167 ; Training: accuracy=0.693564 ; Validation: accuracy=0.763500
Epoch 1 ; Time: 0.365680 ; Training: accuracy=0.430528 ; Validation: accuracy=0.664038
Epoch 2 ; Time: 0.674159 ; Training: accuracy=0.539604 ; Validation: accuracy=0.660010
Epoch 3 ; Time: 0.975037 ; Training: accuracy=0.553465 ; Validation: accuracy=0.704145
Epoch 1 ; Time: 2.566125 ; Training: accuracy=0.036401 ; Validation: accuracy=0.037205
Epoch 1 ; Time: 0.302127 ; Training: accuracy=0.313240 ; Validation: accuracy=0.648770
Epoch 2 ; Time: 0.545325 ; Training: accuracy=0.475822 ; Validation: accuracy=0.710106
Epoch 3 ; Time: 0.788822 ; Training: accuracy=0.519243 ; Validation: accuracy=0.757646
Epoch 4 ; Time: 1.040303 ; Training: accuracy=0.560773 ; Validation: accuracy=0.763298
Epoch 5 ; Time: 1.356490 ; Training: accuracy=0.567845 ; Validation: accuracy=0.748172
Epoch 6 ; Time: 1.596924 ; Training: accuracy=0.592845 ; Validation: accuracy=0.769947
Epoch 7 ; Time: 1.845251 ; Training: accuracy=0.596546 ; Validation: accuracy=0.777593
Epoch 8 ; Time: 2.082652 ; Training: accuracy=0.614145 ; Validation: accuracy=0.800532
Epoch 9 ; Time: 2.323472 ; Training: accuracy=0.612007 ; Validation: accuracy=0.804355
Epoch 1 ; Time: 0.419610 ; Training: accuracy=0.307109 ; Validation: accuracy=0.596512
Epoch 2 ; Time: 0.779184 ; Training: accuracy=0.555418 ; Validation: accuracy=0.705316
Epoch 3 ; Time: 1.139569 ; Training: accuracy=0.633680 ; Validation: accuracy=0.761794
Epoch 4 ; Time: 1.495117 ; Training: accuracy=0.682913 ; Validation: accuracy=0.792525
Epoch 5 ; Time: 1.864757 ; Training: accuracy=0.706004 ; Validation: accuracy=0.801329
Epoch 6 ; Time: 2.246425 ; Training: accuracy=0.733053 ; Validation: accuracy=0.831894
Epoch 7 ; Time: 2.609084 ; Training: accuracy=0.750041 ; Validation: accuracy=0.842525
Epoch 8 ; Time: 2.970836 ; Training: accuracy=0.758123 ; Validation: accuracy=0.843522
Epoch 9 ; Time: 3.350720 ; Training: accuracy=0.761587 ; Validation: accuracy=0.850498
Epoch 1 ; Time: 0.402423 ; Training: accuracy=0.404845 ; Validation: accuracy=0.662202
Epoch 2 ; Time: 0.682933 ; Training: accuracy=0.581597 ; Validation: accuracy=0.759259
Epoch 3 ; Time: 0.954933 ; Training: accuracy=0.634755 ; Validation: accuracy=0.764220
Epoch 4 ; Time: 1.249859 ; Training: accuracy=0.654266 ; Validation: accuracy=0.801587
Epoch 5 ; Time: 1.525737 ; Training: accuracy=0.675595 ; Validation: accuracy=0.806217
Epoch 6 ; Time: 1.806451 ; Training: accuracy=0.683284 ; Validation: accuracy=0.803571
Epoch 7 ; Time: 2.082918 ; Training: accuracy=0.690972 ; Validation: accuracy=0.818948
Epoch 8 ; Time: 2.362340 ; Training: accuracy=0.697090 ; Validation: accuracy=0.812996
Epoch 9 ; Time: 2.638778 ; Training: accuracy=0.698495 ; Validation: accuracy=0.830522
Epoch 1 ; Time: 0.304805 ; Training: accuracy=0.088076 ; Validation: accuracy=0.127161
Epoch 1 ; Time: 0.529478 ; Training: accuracy=0.564615 ; Validation: accuracy=0.781323
Epoch 2 ; Time: 0.983483 ; Training: accuracy=0.720663 ; Validation: accuracy=0.835783
Epoch 3 ; Time: 1.390548 ; Training: accuracy=0.747236 ; Validation: accuracy=0.846308
Epoch 4 ; Time: 1.802840 ; Training: accuracy=0.774798 ; Validation: accuracy=0.871534
Epoch 5 ; Time: 2.206345 ; Training: accuracy=0.790477 ; Validation: accuracy=0.873705
Epoch 6 ; Time: 2.604947 ; Training: accuracy=0.803763 ; Validation: accuracy=0.880722
Epoch 7 ; Time: 3.010663 ; Training: accuracy=0.812593 ; Validation: accuracy=0.892416
Epoch 8 ; Time: 3.414541 ; Training: accuracy=0.818039 ; Validation: accuracy=0.899098
Epoch 9 ; Time: 3.811605 ; Training: accuracy=0.818122 ; Validation: accuracy=0.902773
Epoch 1 ; Time: 2.439476 ; Training: accuracy=0.161253 ; Validation: accuracy=0.519160
Epoch 1 ; Time: 0.440162 ; Training: accuracy=0.539269 ; Validation: accuracy=0.764252
Epoch 2 ; Time: 0.799785 ; Training: accuracy=0.699239 ; Validation: accuracy=0.804829
Epoch 3 ; Time: 1.158598 ; Training: accuracy=0.743221 ; Validation: accuracy=0.838028
Epoch 4 ; Time: 1.510669 ; Training: accuracy=0.769759 ; Validation: accuracy=0.846915
Epoch 5 ; Time: 1.872108 ; Training: accuracy=0.781746 ; Validation: accuracy=0.867371
Epoch 6 ; Time: 2.228940 ; Training: accuracy=0.801422 ; Validation: accuracy=0.860329
Epoch 7 ; Time: 2.577767 ; Training: accuracy=0.808036 ; Validation: accuracy=0.881288
Epoch 8 ; Time: 2.926216 ; Training: accuracy=0.819775 ; Validation: accuracy=0.891013
Epoch 9 ; Time: 3.273591 ; Training: accuracy=0.820023 ; Validation: accuracy=0.896546
Epoch 1 ; Time: 0.361825 ; Training: accuracy=0.408364 ; Validation: accuracy=0.683375
Epoch 2 ; Time: 0.654440 ; Training: accuracy=0.578054 ; Validation: accuracy=0.749541
Epoch 3 ; Time: 0.993663 ; Training: accuracy=0.624182 ; Validation: accuracy=0.779282
Epoch 4 ; Time: 1.325764 ; Training: accuracy=0.649193 ; Validation: accuracy=0.793317
Epoch 5 ; Time: 1.624220 ; Training: accuracy=0.665176 ; Validation: accuracy=0.816040
Epoch 6 ; Time: 1.908085 ; Training: accuracy=0.678095 ; Validation: accuracy=0.818212
Epoch 7 ; Time: 2.214493 ; Training: accuracy=0.687039 ; Validation: accuracy=0.819716
Epoch 8 ; Time: 2.533839 ; Training: accuracy=0.695072 ; Validation: accuracy=0.829741
Epoch 9 ; Time: 2.839333 ; Training: accuracy=0.691511 ; Validation: accuracy=0.834921
Epoch 1 ; Time: 1.666238 ; Training: accuracy=0.577529 ; Validation: accuracy=0.719529
Epoch 2 ; Time: 3.284903 ; Training: accuracy=0.682753 ; Validation: accuracy=0.779798
Epoch 3 ; Time: 4.900074 ; Training: accuracy=0.695937 ; Validation: accuracy=0.798653
Epoch 4 ; Time: 6.471868 ; Training: accuracy=0.719818 ; Validation: accuracy=0.802862
Epoch 5 ; Time: 8.046851 ; Training: accuracy=0.728275 ; Validation: accuracy=0.801515
Epoch 6 ; Time: 9.636511 ; Training: accuracy=0.736733 ; Validation: accuracy=0.812626
Epoch 7 ; Time: 11.224380 ; Training: accuracy=0.745688 ; Validation: accuracy=0.823232
Epoch 8 ; Time: 12.813148 ; Training: accuracy=0.754229 ; Validation: accuracy=0.825084
Epoch 9 ; Time: 14.397673 ; Training: accuracy=0.750166 ; Validation: accuracy=0.823737
Epoch 1 ; Time: 0.364099 ; Training: accuracy=0.456146 ; Validation: accuracy=0.697690
Epoch 2 ; Time: 0.661381 ; Training: accuracy=0.667270 ; Validation: accuracy=0.793438
Epoch 3 ; Time: 0.953701 ; Training: accuracy=0.726921 ; Validation: accuracy=0.821727
Epoch 4 ; Time: 1.256377 ; Training: accuracy=0.753908 ; Validation: accuracy=0.844660
Epoch 5 ; Time: 1.551664 ; Training: accuracy=0.763535 ; Validation: accuracy=0.847004
Epoch 6 ; Time: 1.839864 ; Training: accuracy=0.786655 ; Validation: accuracy=0.861902
Epoch 7 ; Time: 2.143409 ; Training: accuracy=0.790439 ; Validation: accuracy=0.870104
Epoch 8 ; Time: 2.432761 ; Training: accuracy=0.799572 ; Validation: accuracy=0.876297
Epoch 9 ; Time: 2.731339 ; Training: accuracy=0.805908 ; Validation: accuracy=0.883830
Epoch 1 ; Time: 0.292349 ; Training: accuracy=0.524342 ; Validation: accuracy=0.761137
Epoch 2 ; Time: 0.536490 ; Training: accuracy=0.709293 ; Validation: accuracy=0.814661
Epoch 3 ; Time: 0.769076 ; Training: accuracy=0.754605 ; Validation: accuracy=0.849069
Epoch 4 ; Time: 1.040626 ; Training: accuracy=0.783964 ; Validation: accuracy=0.854222
Epoch 5 ; Time: 1.338139 ; Training: accuracy=0.798026 ; Validation: accuracy=0.873670
Epoch 6 ; Time: 1.582450 ; Training: accuracy=0.809539 ; Validation: accuracy=0.885971
Epoch 7 ; Time: 1.817418 ; Training: accuracy=0.824178 ; Validation: accuracy=0.899269
Epoch 8 ; Time: 2.053087 ; Training: accuracy=0.835197 ; Validation: accuracy=0.903092
Epoch 9 ; Time: 2.340726 ; Training: accuracy=0.844079 ; Validation: accuracy=0.897440
Epoch 1 ; Time: 0.955621 ; Training: accuracy=0.375393 ; Validation: accuracy=0.674579
Epoch 2 ; Time: 1.840874 ; Training: accuracy=0.496357 ; Validation: accuracy=0.728283
Epoch 3 ; Time: 2.722841 ; Training: accuracy=0.534112 ; Validation: accuracy=0.750673
Epoch 1 ; Time: 0.339429 ; Training: accuracy=0.623843 ; Validation: accuracy=0.804209
Epoch 2 ; Time: 0.624480 ; Training: accuracy=0.769593 ; Validation: accuracy=0.860606
Epoch 3 ; Time: 0.895745 ; Training: accuracy=0.810929 ; Validation: accuracy=0.882828
Epoch 4 ; Time: 1.178129 ; Training: accuracy=0.824157 ; Validation: accuracy=0.899495
Epoch 5 ; Time: 1.524030 ; Training: accuracy=0.845155 ; Validation: accuracy=0.902525
Epoch 6 ; Time: 1.811664 ; Training: accuracy=0.854249 ; Validation: accuracy=0.903704
Epoch 7 ; Time: 2.218408 ; Training: accuracy=0.867560 ; Validation: accuracy=0.920707
Epoch 8 ; Time: 2.493469 ; Training: accuracy=0.869213 ; Validation: accuracy=0.914478
Epoch 9 ; Time: 2.769367 ; Training: accuracy=0.868552 ; Validation: accuracy=0.913468
Epoch 1 ; Time: 0.340496 ; Training: accuracy=0.391035 ; Validation: accuracy=0.730140
Epoch 2 ; Time: 0.659365 ; Training: accuracy=0.573154 ; Validation: accuracy=0.769860
Epoch 3 ; Time: 0.935759 ; Training: accuracy=0.626003 ; Validation: accuracy=0.809413
Epoch 4 ; Time: 1.210309 ; Training: accuracy=0.657100 ; Validation: accuracy=0.812750
Epoch 5 ; Time: 1.488981 ; Training: accuracy=0.684890 ; Validation: accuracy=0.824099
Epoch 6 ; Time: 1.764681 ; Training: accuracy=0.707303 ; Validation: accuracy=0.836115
Epoch 7 ; Time: 2.039692 ; Training: accuracy=0.719874 ; Validation: accuracy=0.847797
Epoch 8 ; Time: 2.329217 ; Training: accuracy=0.733355 ; Validation: accuracy=0.857644
Epoch 9 ; Time: 2.613654 ; Training: accuracy=0.746009 ; Validation: accuracy=0.868658
Epoch 1 ; Time: 0.399023 ; Training: accuracy=0.621459 ; Validation: accuracy=0.795455
Epoch 2 ; Time: 0.741921 ; Training: accuracy=0.787961 ; Validation: accuracy=0.851939
Epoch 3 ; Time: 1.113843 ; Training: accuracy=0.835639 ; Validation: accuracy=0.887032
Epoch 4 ; Time: 1.467157 ; Training: accuracy=0.856225 ; Validation: accuracy=0.893549
Epoch 5 ; Time: 1.811728 ; Training: accuracy=0.870800 ; Validation: accuracy=0.913436
Epoch 6 ; Time: 2.210745 ; Training: accuracy=0.883482 ; Validation: accuracy=0.912934
Epoch 7 ; Time: 2.596753 ; Training: accuracy=0.887352 ; Validation: accuracy=0.912600
Epoch 8 ; Time: 2.980160 ; Training: accuracy=0.896410 ; Validation: accuracy=0.921624
Epoch 9 ; Time: 3.407195 ; Training: accuracy=0.905221 ; Validation: accuracy=0.916277
Epoch 1 ; Time: 0.339746 ; Training: accuracy=0.515177 ; Validation: accuracy=0.738521
Epoch 2 ; Time: 0.620760 ; Training: accuracy=0.652138 ; Validation: accuracy=0.785106
Epoch 3 ; Time: 0.914248 ; Training: accuracy=0.691010 ; Validation: accuracy=0.816330
Epoch 4 ; Time: 1.206738 ; Training: accuracy=0.691837 ; Validation: accuracy=0.811154
Epoch 5 ; Time: 1.494070 ; Training: accuracy=0.714085 ; Validation: accuracy=0.820838
Epoch 6 ; Time: 1.761333 ; Training: accuracy=0.726325 ; Validation: accuracy=0.825847
Epoch 7 ; Time: 2.031581 ; Training: accuracy=0.740964 ; Validation: accuracy=0.834530
Epoch 8 ; Time: 2.302364 ; Training: accuracy=0.740468 ; Validation: accuracy=0.844214
Epoch 9 ; Time: 2.582657 ; Training: accuracy=0.750393 ; Validation: accuracy=0.846051
Analysing the results¶
The training history is stored in the results_df
, the main fields
are the runtime and 'best'
(the objective).
Note: You will get slightly different curves for different pairs of
scheduler/searcher, the time_out
here is a bit too short to really
see the difference in a significant way (it would be better to set it to
>1000s). Generally speaking though, hyperband stopping / promotion +
model will tend to significantly outperform other combinations given
enough time.
results_df.head()
bracket | elapsed_time | epoch | error | eval_time | objective | runtime | searcher_data_size | searcher_params_kernel_covariance_scale | searcher_params_kernel_inv_bw0 | ... | searcher_params_kernel_inv_bw7 | searcher_params_kernel_inv_bw8 | searcher_params_mean_mean_value | searcher_params_noise_variance | target_epoch | task_id | time_since_start | time_step | time_this_iter | best | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.512684 | 1 | 0.468750 | 0.507896 | 0.531250 | 0.812414 | NaN | 1.0 | 1.0 | ... | 1.0 | 1.0 | 0.0 | 0.001 | 9 | 0 | 0.814238 | 1.615349e+09 | 0.541678 | 0.468750 |
1 | 0 | 0.968906 | 2 | 0.344753 | 0.451589 | 0.655247 | 1.268636 | 1.0 | 1.0 | 1.0 | ... | 1.0 | 1.0 | 0.0 | 0.001 | 9 | 0 | 1.269436 | 1.615349e+09 | 0.456204 | 0.344753 |
2 | 0 | 1.415154 | 3 | 0.305314 | 0.444312 | 0.694686 | 1.714884 | 1.0 | 1.0 | 1.0 | ... | 1.0 | 1.0 | 0.0 | 0.001 | 9 | 0 | 1.715839 | 1.615349e+09 | 0.446248 | 0.305314 |
3 | 0 | 1.976102 | 4 | 0.288937 | 0.558345 | 0.711063 | 2.275832 | 2.0 | 1.0 | 1.0 | ... | 1.0 | 1.0 | 0.0 | 0.001 | 9 | 0 | 2.276751 | 1.615349e+09 | 0.560948 | 0.288937 |
4 | 0 | 2.422040 | 5 | 0.273061 | 0.443948 | 0.726939 | 2.721770 | 2.0 | 1.0 | 1.0 | ... | 1.0 | 1.0 | 0.0 | 0.001 | 9 | 0 | 2.722503 | 1.615349e+09 | 0.445938 | 0.273061 |
5 rows × 26 columns
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
runtime = results_df['runtime'].values
objective = results_df['best'].values
plt.plot(runtime, objective, lw=2)
plt.xticks(fontsize=12)
plt.xlim(0, 120)
plt.ylim(0, 0.5)
plt.yticks(fontsize=12)
plt.xlabel("Runtime [s]", fontsize=14)
plt.ylabel("Objective", fontsize=14)
Text(0, 0.5, 'Objective')
Diving Deeper¶
Now, you are ready to try HPO on your own machine learning models (if you use PyTorch, have a look at Tune PyTorch Model on MNIST). While AutoGluon comes with well-chosen defaults, it can pay off to tune it to your specific needs. Here are some tips which may come useful.
Logging the Search Progress¶
First, it is a good idea in general to switch on debug_log
, which
outputs useful information about the search progress. This is already
done in the example above.
The outputs show which configurations are chosen, stopped, or promoted.
For BO and BOHB, a range of information is displayed for every
get_config
decision. This log output is very useful in order to
figure out what is going on during the search.
Configuring HyperbandScheduler
¶
The most important knobs to turn with HyperbandScheduler
are
max_t
, grace_period
, reduction_factor
, brackets
, and
type
. The first three determine the rung levels at which stopping or
promotion decisions are being made.
The maximum resource level
max_t
(usually, resource equates to epochs, somax_t
is the maximum number of training epochs) is typically hardcoded intrain_fn
passed to the scheduler (this isrun_mlp_openml
in the example above). As already noted above, the value is best fixed in theag.args
decorator asepochs=XYZ
, it can then be accessed asargs.epochs
in thetrain_fn
code. If this is done, you do not have to passmax_t
when creating the scheduler.grace_period
andreduction_factor
determine the rung levels, which aregrace_period
,grace_period * reduction_factor
,grace_period * (reduction_factor ** 2)
, etc. All rung levels must be less or equal thanmax_t
. It is recommended to makemax_t
equal to the largest rung level. For example, ifgrace_period = 1
,reduction_factor = 3
, it is in general recommended to usemax_t = 9
,max_t = 27
, ormax_t = 81
. Choosing amax_t
value “off the grid” works against the successive halving principle that the total resources spent in a rung should be roughly equal between rungs. If in the example above, you setmax_t = 10
, about a third of configurations reaching 9 epochs are allowed to proceed, but only for one more epoch.With
reduction_factor
, you tune the extent to which successive halving filtering is applied. The larger this integer, the fewer configurations make it to higher number of epochs. Values 2, 3, 4 are commonly used.Finally,
grace_period
should be set to the smallest resource (number of epochs) for which you expect any meaningful differentiation between configurations. Whilegrace_period = 1
should always be explored, it may be too low for any meaningful stopping decisions to be made at the first rung.brackets
sets the maximum number of brackets in Hyperband (make sure to study the Hyperband paper or follow-ups for details). Forbrackets = 1
, you are running successive halving (single bracket). Higher brackets have larger effectivegrace_period
values (so runs are not stopped until later), yet are also chosen with less probability. We recommend to always consider successive halving (brackets = 1
) in a comparison.Finally, with
type
(valuesstopping
,promotion
) you are choosing different ways of extending successive halving scheduling to the asynchronous case. The method for the defaultstopping
is simpler and seems to perform well, butpromotion
is more careful promoting configurations to higher resource levels, which can work better in some cases.
Asynchronous BOHB¶
Finally, here are some ideas for tuning asynchronous BOHB, apart from
tuning its HyperbandScheduling
component. You need to pass these
options in search_options
.
We support a range of different surrogate models over the criterion functions across resource levels. All of them are jointly dependent Gaussian process models, meaning that data collected at all resource levels are modelled together. The surrogate model is selected by
gp_resource_kernel
, values arematern52
,matern52-res-warp
,exp-decay-sum
,exp-decay-combined
,exp-decay-delta1
. These are variants of either a joint Matern 5/2 kernel over configuration and resource, or the exponential decay model. Details about the latter can be found here.Fitting a Gaussian process surrogate model to data encurs a cost which scales cubically with the number of datapoints. When applied to expensive deep learning workloads, even multi-fidelity asynchronous BOHB is rarely running up more than 100 observations or so (across all rung levels and brackets), and the GP computations are subdominant. However, if you apply it to cheaper
train_fn
and find yourself beyond 2000 total evaluations, the cost of GP fitting can become painful. In such a situation, you can explore the optionsopt_skip_period
andopt_skip_num_max_resource
. The basic idea is as follows. By far the most expensive part of aget_config
call (picking the next configuration) is the refitting of the GP model to past data (this entails re-optimizing hyperparameters of the surrogate model itself). The options allow you to skip this expensive step for mostget_config
calls, after some initial period. Check the docstrings for details about these options. If you find yourself in such a situation and gain experience with these skipping features, make sure to contact the AutoGluon developers – we would love to learn about your use case.