Getting started with Advanced HPO Algorithms

This tutorial provides a complete example of how to use AutoGluon’s state-of-the-art hyperparameter optimization (HPO) algorithms to tune a basic Multi-Layer Perceptron (MLP) model, which is the most basic type of neural network.

Loading libraries

# Basic utils for folder manipulations etc
import time
import multiprocessing # to count the number of CPUs available

# External tools to load and process data
import numpy as np
import pandas as pd

# MXNet (NeuralNets)
import mxnet as mx
from mxnet import gluon, autograd
from mxnet.gluon import nn

# AutoGluon and HPO tools
import autogluon.core as ag
from autogluon.mxnet.utils import load_and_split_openml_data

Check the version of MxNet, you should be fine with version >= 1.5

mx.__version__
'1.7.0'

You can also check the version of AutoGluon and the specific commit and check that it matches what you want.

import autogluon.core.version
ag.version.__version__
'0.1.0b20210301'

Hyperparameter Optimization of a 2-layer MLP

Setting up the context

Here we declare a few “environment variables” setting the context for what we’re doing

OPENML_TASK_ID = 6                # describes the problem we will tackle
RATIO_TRAIN_VALID = 0.33          # split of the training data used for validation
RESOURCE_ATTR_NAME = 'epoch'      # how do we measure resources   (will become clearer further)
REWARD_ATTR_NAME = 'objective'    # how do we measure performance (will become clearer further)

NUM_CPUS = multiprocessing.cpu_count()

Preparing the data

We will use a multi-way classification task from OpenML. Data preparation includes:

  • Missing values are imputed, using the ‘mean’ strategy of sklearn.impute.SimpleImputer

  • Split training set into training and validation

  • Standardize inputs to mean 0, variance 1

X_train, X_valid, y_train, y_valid, n_classes = load_and_split_openml_data(
    OPENML_TASK_ID, RATIO_TRAIN_VALID, download_from_openml=False)
n_classes
100%|██████████| 704/704 [00:00<00:00, 41137.24KB/s]
100%|██████████| 2521/2521 [00:00<00:00, 30052.13KB/s]
3KB [00:00, 3548.48KB/s]
8KB [00:00, 10233.13KB/s]
15KB [00:00, 15553.66KB/s]
2998KB [00:00, 31984.93KB/s]
881KB [00:00, 38288.07KB/s]
3KB [00:00, 3992.04KB/s]
26

The problem has 26 classes.

Declaring a model specifying a hyperparameter space with AutoGluon

Two layer MLP where we optimize over:

  • the number of units on the first layer

  • the number of units on the second layer

  • the dropout rate after each layer

  • the learning rate

  • the scaling

  • the @ag.args decorator allows us to specify the space we will optimize over, this matches the ConfigSpace syntax

The body of the function run_mlp_openml is pretty simple:

  • it reads the hyperparameters given via the decorator

  • it defines a 2 layer MLP with dropout

  • it declares a trainer with the ‘adam’ loss function and a provided learning rate

  • it trains the NN with a number of epochs (most of that is boilerplate code from mxnet)

  • the reporter at the end is used to keep track of training history in the hyperparameter optimization

Note: The number of epochs and the hyperparameter space are reduced to make for a shorter experiment

@ag.args(n_units_1=ag.space.Int(lower=16, upper=128),
         n_units_2=ag.space.Int(lower=16, upper=128),
         dropout_1=ag.space.Real(lower=0, upper=.75),
         dropout_2=ag.space.Real(lower=0, upper=.75),
         learning_rate=ag.space.Real(lower=1e-6, upper=1, log=True),
         batch_size=ag.space.Int(lower=8, upper=128),
         scale_1=ag.space.Real(lower=0.001, upper=10, log=True),
         scale_2=ag.space.Real(lower=0.001, upper=10, log=True),
         epochs=9)
def run_mlp_openml(args, reporter, **kwargs):
    # Time stamp for elapsed_time
    ts_start = time.time()
    # Unwrap hyperparameters
    n_units_1 = args.n_units_1
    n_units_2 = args.n_units_2
    dropout_1 = args.dropout_1
    dropout_2 = args.dropout_2
    scale_1 = args.scale_1
    scale_2 = args.scale_2
    batch_size = args.batch_size
    learning_rate = args.learning_rate

    ctx = mx.cpu()
    net = nn.Sequential()
    with net.name_scope():
        # Layer 1
        net.add(nn.Dense(n_units_1, activation='relu',
                         weight_initializer=mx.initializer.Uniform(scale=scale_1)))
        # Dropout
        net.add(gluon.nn.Dropout(dropout_1))
        # Layer 2
        net.add(nn.Dense(n_units_2, activation='relu',
                         weight_initializer=mx.initializer.Uniform(scale=scale_2)))
        # Dropout
        net.add(gluon.nn.Dropout(dropout_2))
        # Output
        net.add(nn.Dense(n_classes))
    net.initialize(ctx=ctx)

    trainer = gluon.Trainer(net.collect_params(), 'adam',
                            {'learning_rate': learning_rate})

    for epoch in range(args.epochs):
        ts_epoch = time.time()

        train_iter = mx.io.NDArrayIter(
                        data={'data': X_train},
                        label={'label': y_train},
                        batch_size=batch_size,
                        shuffle=True)
        valid_iter = mx.io.NDArrayIter(
                        data={'data': X_valid},
                        label={'label': y_valid},
                        batch_size=batch_size,
                        shuffle=False)

        metric = mx.metric.Accuracy()
        loss = gluon.loss.SoftmaxCrossEntropyLoss()

        for batch in train_iter:
            data = batch.data[0].as_in_context(ctx)
            label = batch.label[0].as_in_context(ctx)
            with autograd.record():
                output = net(data)
                L = loss(output, label)
            L.backward()
            trainer.step(data.shape[0])
            metric.update([label], [output])

        name, train_acc = metric.get()

        metric = mx.metric.Accuracy()
        for batch in valid_iter:
            data = batch.data[0].as_in_context(ctx)
            label = batch.label[0].as_in_context(ctx)
            output = net(data)
            metric.update([label], [output])

        name, val_acc = metric.get()

        print('Epoch %d ; Time: %f ; Training: %s=%f ; Validation: %s=%f' % (
            epoch + 1, time.time() - ts_start, name, train_acc, name, val_acc))

        ts_now = time.time()
        eval_time = ts_now - ts_epoch
        elapsed_time = ts_now - ts_start

        # The resource reported back (as 'epoch') is the number of epochs
        # done, starting at 1
        reporter(
            epoch=epoch + 1,
            objective=float(val_acc),
            eval_time=eval_time,
            time_step=ts_now,
            elapsed_time=elapsed_time)

Note: The annotation epochs=9 specifies the maximum number of epochs for training. It becomes available as args.epochs. Importantly, it is also processed by HyperbandScheduler below in order to set its max_t attribute.

Recommendation: Whenever writing training code to be passed as train_fn to a scheduler, if this training code reports a resource (or time) attribute, the corresponding maximum resource value should be included in train_fn.args:

  • If the resource attribute (time_attr of scheduler) in train_fn is epoch, make sure to include epochs=XYZ in the annotation. This allows the scheduler to read max_t from train_fn.args.epochs. This case corresponds to our example here.

  • If the resource attribute is something else than epoch, you can also include the annotation max_t=XYZ, which allows the scheduler to read max_t from train_fn.args.max_t.

Annotating the training function by the correct value for max_t simplifies scheduler creation (since max_t does not have to be passed), and avoids inconsistencies between train_fn and the scheduler.

Running the Hyperparameter Optimization

You can use the following schedulers:

  • FIFO (fifo)

  • Hyperband (either the stopping (hbs) or promotion (hbp) variant)

And the following searchers:

  • Random search (random)

  • Gaussian process based Bayesian optimization (bayesopt)

  • SkOpt Bayesian optimization (skopt; only with FIFO scheduler)

Note that the method known as (asynchronous) Hyperband is using random search. Combining Hyperband scheduling with the bayesopt searcher uses a novel method called asynchronous BOHB.

Pick the combination you’re interested in (doing the full experiment takes around 120 seconds, see the time_out parameter), running everything with multiple runs can take a fair bit of time. In real life, you will want to choose a larger time_out in order to obtain good performance.

SCHEDULER = "hbs"
SEARCHER = "bayesopt"
def compute_error(df):
    return 1.0 - df["objective"]

def compute_runtime(df, start_timestamp):
        return df["time_step"] - start_timestamp

def process_training_history(task_dicts, start_timestamp,
                             runtime_fn=compute_runtime,
                             error_fn=compute_error):
    task_dfs = []
    for task_id in task_dicts:
        task_df = pd.DataFrame(task_dicts[task_id])
        task_df = task_df.assign(task_id=task_id,
                                 runtime=runtime_fn(task_df, start_timestamp),
                                 error=error_fn(task_df),
                                 target_epoch=task_df["epoch"].iloc[-1])
        task_dfs.append(task_df)

    result = pd.concat(task_dfs, axis="index", ignore_index=True, sort=True)
    # re-order by runtime
    result = result.sort_values(by="runtime")
    # calculate incumbent best -- the cumulative minimum of the error.
    result = result.assign(best=result["error"].cummin())
    return result

resources = dict(num_cpus=NUM_CPUS, num_gpus=0)
search_options = {
    'num_init_random': 2,
    'debug_log': True}
if SCHEDULER == 'fifo':
    myscheduler = ag.scheduler.FIFOScheduler(
        run_mlp_openml,
        resource=resources,
        searcher=SEARCHER,
        search_options=search_options,
        time_out=120,
        time_attr=RESOURCE_ATTR_NAME,
        reward_attr=REWARD_ATTR_NAME)

else:
    # This setup uses rung levels at 1, 3, 9 epochs. We just use a single
    # bracket, so this is in fact successive halving (Hyperband would use
    # more than 1 bracket).
    # Also note that since we do not use the max_t argument of
    # HyperbandScheduler, this value is obtained from train_fn.args.epochs.
    sch_type = 'stopping' if SCHEDULER == 'hbs' else 'promotion'
    myscheduler = ag.scheduler.HyperbandScheduler(
        run_mlp_openml,
        resource=resources,
        searcher=SEARCHER,
        search_options=search_options,
        time_out=120,
        time_attr=RESOURCE_ATTR_NAME,
        reward_attr=REWARD_ATTR_NAME,
        type=sch_type,
        grace_period=1,
        reduction_factor=3,
        brackets=1)

# run tasks
myscheduler.run()
myscheduler.join_jobs()

results_df = process_training_history(
                myscheduler.training_history.copy(),
                start_timestamp=myscheduler._start_time)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-course-v3/venv/lib/python3.7/site-packages/distributed/worker.py:3460: UserWarning: Large object of size 1.30 MB detected in task graph:
  (0, <function run_mlp_openml at 0x7f9f43420290>, { ... sReporter}, [])
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good
  % (format_bytes(len(b)), s)
Epoch 1 ; Time: 0.485816 ; Training: accuracy=0.260079 ; Validation: accuracy=0.531250
Epoch 2 ; Time: 0.925723 ; Training: accuracy=0.496365 ; Validation: accuracy=0.655247
Epoch 3 ; Time: 1.350109 ; Training: accuracy=0.559650 ; Validation: accuracy=0.694686
Epoch 4 ; Time: 1.776294 ; Training: accuracy=0.588896 ; Validation: accuracy=0.711063
Epoch 5 ; Time: 2.202718 ; Training: accuracy=0.609385 ; Validation: accuracy=0.726939
Epoch 6 ; Time: 2.660790 ; Training: accuracy=0.628139 ; Validation: accuracy=0.745321
Epoch 7 ; Time: 3.109511 ; Training: accuracy=0.641193 ; Validation: accuracy=0.750501
Epoch 8 ; Time: 3.538582 ; Training: accuracy=0.653751 ; Validation: accuracy=0.763202
Epoch 9 ; Time: 3.966724 ; Training: accuracy=0.665482 ; Validation: accuracy=0.766043
Epoch 1 ; Time: 0.354436 ; Training: accuracy=0.581848 ; Validation: accuracy=0.771103
Epoch 2 ; Time: 0.704473 ; Training: accuracy=0.715099 ; Validation: accuracy=0.823125
Epoch 3 ; Time: 0.995774 ; Training: accuracy=0.758663 ; Validation: accuracy=0.852660
Epoch 4 ; Time: 1.353303 ; Training: accuracy=0.778548 ; Validation: accuracy=0.852828
Epoch 5 ; Time: 1.653463 ; Training: accuracy=0.782921 ; Validation: accuracy=0.873301
Epoch 6 ; Time: 1.945009 ; Training: accuracy=0.793977 ; Validation: accuracy=0.876489
Epoch 7 ; Time: 2.234698 ; Training: accuracy=0.806931 ; Validation: accuracy=0.876825
Epoch 8 ; Time: 2.547375 ; Training: accuracy=0.811056 ; Validation: accuracy=0.887229
Epoch 9 ; Time: 2.843662 ; Training: accuracy=0.821782 ; Validation: accuracy=0.893606
Epoch 1 ; Time: 0.413744 ; Training: accuracy=0.031333 ; Validation: accuracy=0.025862
Epoch 1 ; Time: 0.474860 ; Training: accuracy=0.039519 ; Validation: accuracy=0.036720
Epoch 1 ; Time: 0.493186 ; Training: accuracy=0.581720 ; Validation: accuracy=0.754079
Epoch 2 ; Time: 0.871644 ; Training: accuracy=0.704632 ; Validation: accuracy=0.791708
Epoch 3 ; Time: 1.246018 ; Training: accuracy=0.731844 ; Validation: accuracy=0.846653
Epoch 4 ; Time: 1.625319 ; Training: accuracy=0.763689 ; Validation: accuracy=0.855644
Epoch 5 ; Time: 2.177871 ; Training: accuracy=0.771133 ; Validation: accuracy=0.859640
Epoch 6 ; Time: 2.560357 ; Training: accuracy=0.786352 ; Validation: accuracy=0.871295
Epoch 7 ; Time: 2.949618 ; Training: accuracy=0.792556 ; Validation: accuracy=0.879287
Epoch 8 ; Time: 3.327809 ; Training: accuracy=0.799090 ; Validation: accuracy=0.884782
Epoch 9 ; Time: 3.777623 ; Training: accuracy=0.802730 ; Validation: accuracy=0.882950
Epoch 1 ; Time: 0.497789 ; Training: accuracy=0.064532 ; Validation: accuracy=0.121450
Epoch 1 ; Time: 0.578648 ; Training: accuracy=0.251240 ; Validation: accuracy=0.535881
Epoch 1 ; Time: 0.503301 ; Training: accuracy=0.413499 ; Validation: accuracy=0.654333
Epoch 2 ; Time: 0.929561 ; Training: accuracy=0.559917 ; Validation: accuracy=0.725667
Epoch 3 ; Time: 1.368943 ; Training: accuracy=0.611511 ; Validation: accuracy=0.762000
Epoch 1 ; Time: 0.767868 ; Training: accuracy=0.246300 ; Validation: accuracy=0.431960
Epoch 1 ; Time: 0.312346 ; Training: accuracy=0.247434 ; Validation: accuracy=0.537415
Epoch 1 ; Time: 0.501000 ; Training: accuracy=0.625744 ; Validation: accuracy=0.809308
Epoch 2 ; Time: 1.085015 ; Training: accuracy=0.757771 ; Validation: accuracy=0.836190
Epoch 3 ; Time: 1.532385 ; Training: accuracy=0.796048 ; Validation: accuracy=0.846270
Epoch 1 ; Time: 0.443196 ; Training: accuracy=0.410110 ; Validation: accuracy=0.619753
Epoch 2 ; Time: 0.825119 ; Training: accuracy=0.494746 ; Validation: accuracy=0.673051
Epoch 3 ; Time: 1.209963 ; Training: accuracy=0.517664 ; Validation: accuracy=0.674384
Epoch 1 ; Time: 0.455926 ; Training: accuracy=0.520890 ; Validation: accuracy=0.735177
Epoch 2 ; Time: 0.826737 ; Training: accuracy=0.694713 ; Validation: accuracy=0.771319
Epoch 3 ; Time: 1.195614 ; Training: accuracy=0.736576 ; Validation: accuracy=0.830779
Epoch 1 ; Time: 0.752027 ; Training: accuracy=0.044942 ; Validation: accuracy=0.176768
Epoch 1 ; Time: 0.383156 ; Training: accuracy=0.638724 ; Validation: accuracy=0.810774
Epoch 2 ; Time: 0.674688 ; Training: accuracy=0.777282 ; Validation: accuracy=0.842088
Epoch 3 ; Time: 0.946444 ; Training: accuracy=0.801091 ; Validation: accuracy=0.873064
Epoch 4 ; Time: 1.219203 ; Training: accuracy=0.815228 ; Validation: accuracy=0.879630
Epoch 5 ; Time: 1.490317 ; Training: accuracy=0.833003 ; Validation: accuracy=0.888215
Epoch 6 ; Time: 1.798936 ; Training: accuracy=0.839699 ; Validation: accuracy=0.888384
Epoch 7 ; Time: 2.072390 ; Training: accuracy=0.847636 ; Validation: accuracy=0.891077
Epoch 8 ; Time: 2.345393 ; Training: accuracy=0.850694 ; Validation: accuracy=0.903704
Epoch 9 ; Time: 2.616894 ; Training: accuracy=0.850777 ; Validation: accuracy=0.902525
Epoch 1 ; Time: 0.322179 ; Training: accuracy=0.270546 ; Validation: accuracy=0.473270
Epoch 1 ; Time: 0.458280 ; Training: accuracy=0.571879 ; Validation: accuracy=0.768435
Epoch 2 ; Time: 0.855828 ; Training: accuracy=0.775576 ; Validation: accuracy=0.840174
Epoch 3 ; Time: 1.236585 ; Training: accuracy=0.823081 ; Validation: accuracy=0.850017
Epoch 4 ; Time: 1.628630 ; Training: accuracy=0.848284 ; Validation: accuracy=0.888055
Epoch 5 ; Time: 2.045398 ; Training: accuracy=0.869176 ; Validation: accuracy=0.899733
Epoch 6 ; Time: 2.436024 ; Training: accuracy=0.879042 ; Validation: accuracy=0.897064
Epoch 7 ; Time: 2.827001 ; Training: accuracy=0.886089 ; Validation: accuracy=0.908075
Epoch 8 ; Time: 3.220183 ; Training: accuracy=0.897778 ; Validation: accuracy=0.920754
Epoch 9 ; Time: 3.611714 ; Training: accuracy=0.903250 ; Validation: accuracy=0.917251
Epoch 1 ; Time: 0.577712 ; Training: accuracy=0.575012 ; Validation: accuracy=0.773837
Epoch 2 ; Time: 1.045443 ; Training: accuracy=0.744908 ; Validation: accuracy=0.825192
Epoch 3 ; Time: 1.512438 ; Training: accuracy=0.794006 ; Validation: accuracy=0.861659
Epoch 4 ; Time: 1.989012 ; Training: accuracy=0.817685 ; Validation: accuracy=0.871863
Epoch 5 ; Time: 2.456224 ; Training: accuracy=0.833582 ; Validation: accuracy=0.887755
Epoch 6 ; Time: 2.924245 ; Training: accuracy=0.846829 ; Validation: accuracy=0.893610
Epoch 7 ; Time: 3.430988 ; Training: accuracy=0.851714 ; Validation: accuracy=0.905320
Epoch 8 ; Time: 3.918977 ; Training: accuracy=0.863719 ; Validation: accuracy=0.906992
Epoch 9 ; Time: 4.414324 ; Training: accuracy=0.868356 ; Validation: accuracy=0.911676
Epoch 1 ; Time: 0.944247 ; Training: accuracy=0.433936 ; Validation: accuracy=0.641801
Epoch 1 ; Time: 0.363867 ; Training: accuracy=0.642586 ; Validation: accuracy=0.789894
Epoch 2 ; Time: 0.672631 ; Training: accuracy=0.778822 ; Validation: accuracy=0.832114
Epoch 3 ; Time: 1.031989 ; Training: accuracy=0.800346 ; Validation: accuracy=0.862367
Epoch 4 ; Time: 1.362332 ; Training: accuracy=0.826159 ; Validation: accuracy=0.875332
Epoch 5 ; Time: 1.667801 ; Training: accuracy=0.836302 ; Validation: accuracy=0.875332
Epoch 6 ; Time: 1.976715 ; Training: accuracy=0.844054 ; Validation: accuracy=0.880319
Epoch 7 ; Time: 2.283057 ; Training: accuracy=0.852713 ; Validation: accuracy=0.877660
Epoch 8 ; Time: 2.584614 ; Training: accuracy=0.863434 ; Validation: accuracy=0.896443
Epoch 9 ; Time: 2.890280 ; Training: accuracy=0.865413 ; Validation: accuracy=0.885140
Epoch 1 ; Time: 0.443991 ; Training: accuracy=0.388039 ; Validation: accuracy=0.691309
Epoch 1 ; Time: 0.426861 ; Training: accuracy=0.672291 ; Validation: accuracy=0.813686
Epoch 2 ; Time: 0.809158 ; Training: accuracy=0.850455 ; Validation: accuracy=0.868465
Epoch 3 ; Time: 1.211733 ; Training: accuracy=0.881390 ; Validation: accuracy=0.906593
Epoch 4 ; Time: 1.588458 ; Training: accuracy=0.910008 ; Validation: accuracy=0.912088
Epoch 5 ; Time: 1.955405 ; Training: accuracy=0.922663 ; Validation: accuracy=0.915751
Epoch 6 ; Time: 2.318612 ; Training: accuracy=0.934243 ; Validation: accuracy=0.930569
Epoch 7 ; Time: 2.721848 ; Training: accuracy=0.940612 ; Validation: accuracy=0.927739
Epoch 8 ; Time: 3.162516 ; Training: accuracy=0.946402 ; Validation: accuracy=0.934565
Epoch 9 ; Time: 3.550003 ; Training: accuracy=0.951613 ; Validation: accuracy=0.929071
Epoch 1 ; Time: 0.368529 ; Training: accuracy=0.406714 ; Validation: accuracy=0.661320
Epoch 1 ; Time: 0.398520 ; Training: accuracy=0.464457 ; Validation: accuracy=0.682689
Epoch 1 ; Time: 0.384942 ; Training: accuracy=0.457259 ; Validation: accuracy=0.646043
Epoch 1 ; Time: 0.409404 ; Training: accuracy=0.383181 ; Validation: accuracy=0.619832
Epoch 1 ; Time: 0.538824 ; Training: accuracy=0.470978 ; Validation: accuracy=0.714141
Epoch 1 ; Time: 0.396265 ; Training: accuracy=0.530034 ; Validation: accuracy=0.764236
Epoch 2 ; Time: 0.722034 ; Training: accuracy=0.728001 ; Validation: accuracy=0.828505
Epoch 3 ; Time: 1.087246 ; Training: accuracy=0.786003 ; Validation: accuracy=0.851315
Epoch 1 ; Time: 0.471924 ; Training: accuracy=0.548577 ; Validation: accuracy=0.744837
Epoch 2 ; Time: 0.856599 ; Training: accuracy=0.753062 ; Validation: accuracy=0.805796
Epoch 3 ; Time: 1.233011 ; Training: accuracy=0.806687 ; Validation: accuracy=0.845103
Epoch 1 ; Time: 0.403902 ; Training: accuracy=0.573500 ; Validation: accuracy=0.757267
Epoch 2 ; Time: 0.741710 ; Training: accuracy=0.800923 ; Validation: accuracy=0.834447
Epoch 3 ; Time: 1.113848 ; Training: accuracy=0.858355 ; Validation: accuracy=0.868527
Epoch 4 ; Time: 1.449182 ; Training: accuracy=0.887195 ; Validation: accuracy=0.884564
Epoch 5 ; Time: 1.848668 ; Training: accuracy=0.911503 ; Validation: accuracy=0.897093
Epoch 6 ; Time: 2.207992 ; Training: accuracy=0.919084 ; Validation: accuracy=0.909121
Epoch 7 ; Time: 2.547297 ; Training: accuracy=0.934410 ; Validation: accuracy=0.924658
Epoch 8 ; Time: 2.920436 ; Training: accuracy=0.943309 ; Validation: accuracy=0.918476
Epoch 9 ; Time: 3.282759 ; Training: accuracy=0.947017 ; Validation: accuracy=0.929669
Epoch 1 ; Time: 0.391132 ; Training: accuracy=0.611194 ; Validation: accuracy=0.761953
Epoch 2 ; Time: 0.782814 ; Training: accuracy=0.803980 ; Validation: accuracy=0.840236
Epoch 3 ; Time: 1.159332 ; Training: accuracy=0.854892 ; Validation: accuracy=0.875926
Epoch 4 ; Time: 1.490009 ; Training: accuracy=0.882338 ; Validation: accuracy=0.902694
Epoch 5 ; Time: 1.801840 ; Training: accuracy=0.906302 ; Validation: accuracy=0.920370
Epoch 6 ; Time: 2.220920 ; Training: accuracy=0.915837 ; Validation: accuracy=0.924916
Epoch 7 ; Time: 2.595611 ; Training: accuracy=0.928275 ; Validation: accuracy=0.926936
Epoch 8 ; Time: 2.921697 ; Training: accuracy=0.932172 ; Validation: accuracy=0.926936
Epoch 9 ; Time: 3.257192 ; Training: accuracy=0.937977 ; Validation: accuracy=0.937374
Epoch 1 ; Time: 0.406384 ; Training: accuracy=0.675000 ; Validation: accuracy=0.811000
Epoch 2 ; Time: 0.753011 ; Training: accuracy=0.850414 ; Validation: accuracy=0.872000
Epoch 3 ; Time: 1.104501 ; Training: accuracy=0.890646 ; Validation: accuracy=0.904833
Epoch 4 ; Time: 1.468058 ; Training: accuracy=0.915480 ; Validation: accuracy=0.909667
Epoch 5 ; Time: 1.811414 ; Training: accuracy=0.923013 ; Validation: accuracy=0.906833
Epoch 6 ; Time: 2.160280 ; Training: accuracy=0.939735 ; Validation: accuracy=0.913333
Epoch 7 ; Time: 2.512255 ; Training: accuracy=0.949917 ; Validation: accuracy=0.931333
Epoch 8 ; Time: 2.872170 ; Training: accuracy=0.953974 ; Validation: accuracy=0.933500
Epoch 9 ; Time: 3.218282 ; Training: accuracy=0.957533 ; Validation: accuracy=0.925833
Epoch 1 ; Time: 0.319053 ; Training: accuracy=0.709950 ; Validation: accuracy=0.844292
Epoch 2 ; Time: 0.589575 ; Training: accuracy=0.870813 ; Validation: accuracy=0.875334
Epoch 3 ; Time: 0.987917 ; Training: accuracy=0.902324 ; Validation: accuracy=0.904039
Epoch 4 ; Time: 1.249979 ; Training: accuracy=0.924241 ; Validation: accuracy=0.905040
Epoch 5 ; Time: 1.531832 ; Training: accuracy=0.926226 ; Validation: accuracy=0.908378
Epoch 6 ; Time: 1.788824 ; Training: accuracy=0.940121 ; Validation: accuracy=0.922230
Epoch 7 ; Time: 2.067610 ; Training: accuracy=0.948639 ; Validation: accuracy=0.923231
Epoch 8 ; Time: 2.327184 ; Training: accuracy=0.950955 ; Validation: accuracy=0.926569
Epoch 9 ; Time: 2.611477 ; Training: accuracy=0.957820 ; Validation: accuracy=0.933745
Epoch 1 ; Time: 0.419348 ; Training: accuracy=0.633882 ; Validation: accuracy=0.791375
Epoch 2 ; Time: 0.785120 ; Training: accuracy=0.822649 ; Validation: accuracy=0.857642
Epoch 3 ; Time: 1.140644 ; Training: accuracy=0.874018 ; Validation: accuracy=0.890276
Epoch 4 ; Time: 1.496016 ; Training: accuracy=0.904459 ; Validation: accuracy=0.895604
Epoch 5 ; Time: 1.856606 ; Training: accuracy=0.918107 ; Validation: accuracy=0.901265
Epoch 6 ; Time: 2.219018 ; Training: accuracy=0.932252 ; Validation: accuracy=0.921911
Epoch 7 ; Time: 2.587913 ; Training: accuracy=0.936637 ; Validation: accuracy=0.926906
Epoch 8 ; Time: 2.995049 ; Training: accuracy=0.948135 ; Validation: accuracy=0.930736
Epoch 9 ; Time: 3.366530 ; Training: accuracy=0.951195 ; Validation: accuracy=0.932900
Epoch 1 ; Time: 0.362235 ; Training: accuracy=0.717879 ; Validation: accuracy=0.818816
Epoch 2 ; Time: 0.717216 ; Training: accuracy=0.838941 ; Validation: accuracy=0.847407
Epoch 3 ; Time: 1.011797 ; Training: accuracy=0.877288 ; Validation: accuracy=0.877992
Epoch 4 ; Time: 1.307802 ; Training: accuracy=0.891638 ; Validation: accuracy=0.883477
Epoch 5 ; Time: 1.607926 ; Training: accuracy=0.901204 ; Validation: accuracy=0.879322
Epoch 6 ; Time: 1.917984 ; Training: accuracy=0.905162 ; Validation: accuracy=0.870013
Epoch 7 ; Time: 2.220767 ; Training: accuracy=0.917038 ; Validation: accuracy=0.909408
Epoch 8 ; Time: 2.541039 ; Training: accuracy=0.925697 ; Validation: accuracy=0.898770
Epoch 9 ; Time: 2.893164 ; Training: accuracy=0.922233 ; Validation: accuracy=0.911735
Epoch 1 ; Time: 0.661581 ; Training: accuracy=0.565527 ; Validation: accuracy=0.718211
Epoch 1 ; Time: 0.540891 ; Training: accuracy=0.635220 ; Validation: accuracy=0.782456
Epoch 2 ; Time: 1.021876 ; Training: accuracy=0.824065 ; Validation: accuracy=0.852632
Epoch 3 ; Time: 1.497699 ; Training: accuracy=0.874710 ; Validation: accuracy=0.872180
Epoch 4 ; Time: 1.987899 ; Training: accuracy=0.899123 ; Validation: accuracy=0.894069
Epoch 5 ; Time: 2.488709 ; Training: accuracy=0.916005 ; Validation: accuracy=0.910443
Epoch 6 ; Time: 2.983786 ; Training: accuracy=0.927756 ; Validation: accuracy=0.905263
Epoch 7 ; Time: 3.506227 ; Training: accuracy=0.937272 ; Validation: accuracy=0.917794
Epoch 8 ; Time: 4.000621 ; Training: accuracy=0.945382 ; Validation: accuracy=0.928488
Epoch 9 ; Time: 4.512321 ; Training: accuracy=0.950844 ; Validation: accuracy=0.929156

Analysing the results

The training history is stored in the results_df, the main fields are the runtime and 'best' (the objective).

Note: You will get slightly different curves for different pairs of scheduler/searcher, the time_out here is a bit too short to really see the difference in a significant way (it would be better to set it to >1000s). Generally speaking though, hyperband stopping / promotion + model will tend to significantly outperform other combinations given enough time.

results_df.head()
bracket elapsed_time epoch error eval_time objective runtime searcher_data_size searcher_params_kernel_covariance_scale searcher_params_kernel_inv_bw0 ... searcher_params_kernel_inv_bw7 searcher_params_kernel_inv_bw8 searcher_params_mean_mean_value searcher_params_noise_variance target_epoch task_id time_since_start time_step time_this_iter best
0 0 0.488541 1 0.468750 0.483711 0.531250 1.571379 NaN 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 1.573284 1.614630e+09 0.517057 0.468750
1 0 0.927435 2 0.344753 0.434316 0.655247 2.010273 1.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.011093 1.614630e+09 0.438870 0.344753
2 0 1.351925 3 0.305314 0.422602 0.694686 2.434762 1.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.435731 1.614630e+09 0.424491 0.305314
3 0 1.777915 4 0.288937 0.423248 0.711063 2.860753 2.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.861669 1.614630e+09 0.425990 0.288937
4 0 2.204442 5 0.273061 0.424562 0.726939 3.287279 2.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 3.288359 1.614630e+09 0.426525 0.273061

5 rows × 26 columns

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))

runtime = results_df['runtime'].values
objective = results_df['best'].values

plt.plot(runtime, objective, lw=2)
plt.xticks(fontsize=12)
plt.xlim(0, 120)
plt.ylim(0, 0.5)
plt.yticks(fontsize=12)
plt.xlabel("Runtime [s]", fontsize=14)
plt.ylabel("Objective", fontsize=14)
Text(0, 0.5, 'Objective')

Diving Deeper

Now, you are ready to try HPO on your own machine learning models (if you use PyTorch, have a look at Tune PyTorch Model on MNIST). While AutoGluon comes with well-chosen defaults, it can pay off to tune it to your specific needs. Here are some tips which may come useful.

Logging the Search Progress

First, it is a good idea in general to switch on debug_log, which outputs useful information about the search progress. This is already done in the example above.

The outputs show which configurations are chosen, stopped, or promoted. For BO and BOHB, a range of information is displayed for every get_config decision. This log output is very useful in order to figure out what is going on during the search.

Configuring HyperbandScheduler

The most important knobs to turn with HyperbandScheduler are max_t, grace_period, reduction_factor, brackets, and type. The first three determine the rung levels at which stopping or promotion decisions are being made.

  • The maximum resource level max_t (usually, resource equates to epochs, so max_t is the maximum number of training epochs) is typically hardcoded in train_fn passed to the scheduler (this is run_mlp_openml in the example above). As already noted above, the value is best fixed in the ag.args decorator as epochs=XYZ, it can then be accessed as args.epochs in the train_fn code. If this is done, you do not have to pass max_t when creating the scheduler.

  • grace_period and reduction_factor determine the rung levels, which are grace_period, grace_period * reduction_factor, grace_period * (reduction_factor ** 2), etc. All rung levels must be less or equal than max_t. It is recommended to make max_t equal to the largest rung level. For example, if grace_period = 1, reduction_factor = 3, it is in general recommended to use max_t = 9, max_t = 27, or max_t = 81. Choosing a max_t value “off the grid” works against the successive halving principle that the total resources spent in a rung should be roughly equal between rungs. If in the example above, you set max_t = 10, about a third of configurations reaching 9 epochs are allowed to proceed, but only for one more epoch.

  • With reduction_factor, you tune the extent to which successive halving filtering is applied. The larger this integer, the fewer configurations make it to higher number of epochs. Values 2, 3, 4 are commonly used.

  • Finally, grace_period should be set to the smallest resource (number of epochs) for which you expect any meaningful differentiation between configurations. While grace_period = 1 should always be explored, it may be too low for any meaningful stopping decisions to be made at the first rung.

  • brackets sets the maximum number of brackets in Hyperband (make sure to study the Hyperband paper or follow-ups for details). For brackets = 1, you are running successive halving (single bracket). Higher brackets have larger effective grace_period values (so runs are not stopped until later), yet are also chosen with less probability. We recommend to always consider successive halving (brackets = 1) in a comparison.

  • Finally, with type (values stopping, promotion) you are choosing different ways of extending successive halving scheduling to the asynchronous case. The method for the default stopping is simpler and seems to perform well, but promotion is more careful promoting configurations to higher resource levels, which can work better in some cases.

Asynchronous BOHB

Finally, here are some ideas for tuning asynchronous BOHB, apart from tuning its HyperbandScheduling component. You need to pass these options in search_options.

  • We support a range of different surrogate models over the criterion functions across resource levels. All of them are jointly dependent Gaussian process models, meaning that data collected at all resource levels are modelled together. The surrogate model is selected by gp_resource_kernel, values are matern52, matern52-res-warp, exp-decay-sum, exp-decay-combined, exp-decay-delta1. These are variants of either a joint Matern 5/2 kernel over configuration and resource, or the exponential decay model. Details about the latter can be found here.

  • Fitting a Gaussian process surrogate model to data encurs a cost which scales cubically with the number of datapoints. When applied to expensive deep learning workloads, even multi-fidelity asynchronous BOHB is rarely running up more than 100 observations or so (across all rung levels and brackets), and the GP computations are subdominant. However, if you apply it to cheaper train_fn and find yourself beyond 2000 total evaluations, the cost of GP fitting can become painful. In such a situation, you can explore the options opt_skip_period and opt_skip_num_max_resource. The basic idea is as follows. By far the most expensive part of a get_config call (picking the next configuration) is the refitting of the GP model to past data (this entails re-optimizing hyperparameters of the surrogate model itself). The options allow you to skip this expensive step for most get_config calls, after some initial period. Check the docstrings for details about these options. If you find yourself in such a situation and gain experience with these skipping features, make sure to contact the AutoGluon developers – we would love to learn about your use case.