.. _sec_custom_advancedhpo:

Getting started with Advanced HPO Algorithms
============================================


This tutorial provides a complete example of how to use AutoGluon's
state-of-the-art hyperparameter optimization (HPO) algorithms to tune a
basic Multi-Layer Perceptron (MLP) model, which is the most basic type
of neural network.

Loading libraries
-----------------

.. code:: python

    # Basic utils for folder manipulations etc
    import time
    import multiprocessing # to count the number of CPUs available
    
    # External tools to load and process data
    import numpy as np
    import pandas as pd
    
    # MXNet (NeuralNets)
    import mxnet as mx
    from mxnet import gluon, autograd
    from mxnet.gluon import nn
    
    # AutoGluon and HPO tools
    import autogluon.core as ag
    from autogluon.mxnet.utils import load_and_split_openml_data

Check the version of MxNet, you should be fine with version >= 1.5

.. code:: python

    mx.__version__


.. parsed-literal::
    :class: output

    '1.7.0'


You can also check the version of AutoGluon and the specific commit and
check that it matches what you want.

.. code:: python

    import autogluon.core.version
    ag.version.__version__


.. parsed-literal::
    :class: output

    '0.1.0b20210301'


Hyperparameter Optimization of a 2-layer MLP
--------------------------------------------

Setting up the context
~~~~~~~~~~~~~~~~~~~~~~

Here we declare a few "environment variables" setting the context for
what we're doing

.. code:: python

    OPENML_TASK_ID = 6                # describes the problem we will tackle
    RATIO_TRAIN_VALID = 0.33          # split of the training data used for validation
    RESOURCE_ATTR_NAME = 'epoch'      # how do we measure resources   (will become clearer further)
    REWARD_ATTR_NAME = 'objective'    # how do we measure performance (will become clearer further)
    
    NUM_CPUS = multiprocessing.cpu_count()

Preparing the data
~~~~~~~~~~~~~~~~~~

We will use a multi-way classification task from OpenML. Data
preparation includes:

-  Missing values are imputed, using the 'mean' strategy of
   ``sklearn.impute.SimpleImputer``
-  Split training set into training and validation
-  Standardize inputs to mean 0, variance 1

.. code:: python

    X_train, X_valid, y_train, y_valid, n_classes = load_and_split_openml_data(
        OPENML_TASK_ID, RATIO_TRAIN_VALID, download_from_openml=False)
    n_classes


.. parsed-literal::
    :class: output

    100%|██████████| 704/704 [00:00<00:00, 41137.24KB/s]
    100%|██████████| 2521/2521 [00:00<00:00, 30052.13KB/s]
    3KB [00:00, 3548.48KB/s]             
    8KB [00:00, 10233.13KB/s]            
    15KB [00:00, 15553.66KB/s]            
    2998KB [00:00, 31984.93KB/s]            
    881KB [00:00, 38288.07KB/s]            
    3KB [00:00, 3992.04KB/s]             


.. parsed-literal::
    :class: output

    26


The problem has 26 classes.

Declaring a model specifying a hyperparameter space with AutoGluon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Two layer MLP where we optimize over:

-  the number of units on the first layer
-  the number of units on the second layer
-  the dropout rate after each layer
-  the learning rate
-  the scaling
-  the ``@ag.args`` decorator allows us to specify the space we will
   optimize over, this matches the
   `ConfigSpace <https://automl.github.io/ConfigSpace/master/>`__ syntax

The body of the function ``run_mlp_openml`` is pretty simple:

-  it reads the hyperparameters given via the decorator
-  it defines a 2 layer MLP with dropout
-  it declares a trainer with the 'adam' loss function and a provided
   learning rate
-  it trains the NN with a number of epochs (most of that is boilerplate
   code from ``mxnet``)
-  the ``reporter`` at the end is used to keep track of training history
   in the hyperparameter optimization

**Note**: The number of epochs and the hyperparameter space are reduced
to make for a shorter experiment

.. code:: python

    @ag.args(n_units_1=ag.space.Int(lower=16, upper=128),
             n_units_2=ag.space.Int(lower=16, upper=128),
             dropout_1=ag.space.Real(lower=0, upper=.75),
             dropout_2=ag.space.Real(lower=0, upper=.75),
             learning_rate=ag.space.Real(lower=1e-6, upper=1, log=True),
             batch_size=ag.space.Int(lower=8, upper=128),
             scale_1=ag.space.Real(lower=0.001, upper=10, log=True),
             scale_2=ag.space.Real(lower=0.001, upper=10, log=True),
             epochs=9)
    def run_mlp_openml(args, reporter, **kwargs):
        # Time stamp for elapsed_time
        ts_start = time.time()
        # Unwrap hyperparameters
        n_units_1 = args.n_units_1
        n_units_2 = args.n_units_2
        dropout_1 = args.dropout_1
        dropout_2 = args.dropout_2
        scale_1 = args.scale_1
        scale_2 = args.scale_2
        batch_size = args.batch_size
        learning_rate = args.learning_rate
    
        ctx = mx.cpu()
        net = nn.Sequential()
        with net.name_scope():
            # Layer 1
            net.add(nn.Dense(n_units_1, activation='relu',
                             weight_initializer=mx.initializer.Uniform(scale=scale_1)))
            # Dropout
            net.add(gluon.nn.Dropout(dropout_1))
            # Layer 2
            net.add(nn.Dense(n_units_2, activation='relu',
                             weight_initializer=mx.initializer.Uniform(scale=scale_2)))
            # Dropout
            net.add(gluon.nn.Dropout(dropout_2))
            # Output
            net.add(nn.Dense(n_classes))
        net.initialize(ctx=ctx)
    
        trainer = gluon.Trainer(net.collect_params(), 'adam',
                                {'learning_rate': learning_rate})
    
        for epoch in range(args.epochs):
            ts_epoch = time.time()
    
            train_iter = mx.io.NDArrayIter(
                            data={'data': X_train}, 
                            label={'label': y_train},
                            batch_size=batch_size, 
                            shuffle=True)
            valid_iter = mx.io.NDArrayIter(
                            data={'data': X_valid}, 
                            label={'label': y_valid},
                            batch_size=batch_size, 
                            shuffle=False)
    
            metric = mx.metric.Accuracy()
            loss = gluon.loss.SoftmaxCrossEntropyLoss()
    
            for batch in train_iter:
                data = batch.data[0].as_in_context(ctx)
                label = batch.label[0].as_in_context(ctx)
                with autograd.record():
                    output = net(data)
                    L = loss(output, label)
                L.backward()
                trainer.step(data.shape[0])
                metric.update([label], [output])
    
            name, train_acc = metric.get()
    
            metric = mx.metric.Accuracy()
            for batch in valid_iter:
                data = batch.data[0].as_in_context(ctx)
                label = batch.label[0].as_in_context(ctx)
                output = net(data)
                metric.update([label], [output])
    
            name, val_acc = metric.get()
    
            print('Epoch %d ; Time: %f ; Training: %s=%f ; Validation: %s=%f' % (
                epoch + 1, time.time() - ts_start, name, train_acc, name, val_acc))
    
            ts_now = time.time()
            eval_time = ts_now - ts_epoch
            elapsed_time = ts_now - ts_start
    
            # The resource reported back (as 'epoch') is the number of epochs
            # done, starting at 1
            reporter(
                epoch=epoch + 1, 
                objective=float(val_acc), 
                eval_time=eval_time,
                time_step=ts_now, 
                elapsed_time=elapsed_time)

**Note**: The annotation ``epochs=9`` specifies the maximum number of
epochs for training. It becomes available as ``args.epochs``.
Importantly, it is also processed by ``HyperbandScheduler`` below in
order to set its ``max_t`` attribute.

**Recommendation**: Whenever writing training code to be passed as
``train_fn`` to a scheduler, if this training code reports a resource
(or time) attribute, the corresponding maximum resource value should be
included in ``train_fn.args``:

-  If the resource attribute (``time_attr`` of scheduler) in
   ``train_fn`` is ``epoch``, make sure to include ``epochs=XYZ`` in the
   annotation. This allows the scheduler to read ``max_t`` from
   ``train_fn.args.epochs``. This case corresponds to our example here.
-  If the resource attribute is something else than ``epoch``, you can
   also include the annotation ``max_t=XYZ``, which allows the scheduler
   to read ``max_t`` from ``train_fn.args.max_t``.

Annotating the training function by the correct value for ``max_t``
simplifies scheduler creation (since ``max_t`` does not have to be
passed), and avoids inconsistencies between ``train_fn`` and the
scheduler.

Running the Hyperparameter Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can use the following schedulers:

-  FIFO (``fifo``)
-  Hyperband (either the stopping (``hbs``) or promotion (``hbp``)
   variant)

And the following searchers:

-  Random search (``random``)
-  Gaussian process based Bayesian optimization (``bayesopt``)
-  SkOpt Bayesian optimization (``skopt``; only with FIFO scheduler)

Note that the method known as (asynchronous) Hyperband is using random
search. Combining Hyperband scheduling with the ``bayesopt`` searcher
uses a novel method called asynchronous BOHB.

Pick the combination you're interested in (doing the full experiment
takes around 120 seconds, see the ``time_out`` parameter), running
everything with multiple runs can take a fair bit of time. In real life,
you will want to choose a larger ``time_out`` in order to obtain good
performance.

.. code:: python

    SCHEDULER = "hbs"
    SEARCHER = "bayesopt"

.. code:: python

    def compute_error(df):
        return 1.0 - df["objective"]
    
    def compute_runtime(df, start_timestamp):
            return df["time_step"] - start_timestamp
    
    def process_training_history(task_dicts, start_timestamp, 
                                 runtime_fn=compute_runtime,
                                 error_fn=compute_error):
        task_dfs = []
        for task_id in task_dicts:
            task_df = pd.DataFrame(task_dicts[task_id])
            task_df = task_df.assign(task_id=task_id,
                                     runtime=runtime_fn(task_df, start_timestamp),
                                     error=error_fn(task_df),
                                     target_epoch=task_df["epoch"].iloc[-1])
            task_dfs.append(task_df)
    
        result = pd.concat(task_dfs, axis="index", ignore_index=True, sort=True)
        # re-order by runtime
        result = result.sort_values(by="runtime")
        # calculate incumbent best -- the cumulative minimum of the error.
        result = result.assign(best=result["error"].cummin())
        return result
    
    resources = dict(num_cpus=NUM_CPUS, num_gpus=0)

.. code:: python

    search_options = {
        'num_init_random': 2,
        'debug_log': True}
    if SCHEDULER == 'fifo': 
        myscheduler = ag.scheduler.FIFOScheduler(
            run_mlp_openml,
            resource=resources,
            searcher=SEARCHER,
            search_options=search_options,
            time_out=120,
            time_attr=RESOURCE_ATTR_NAME,
            reward_attr=REWARD_ATTR_NAME)
    
    else:
        # This setup uses rung levels at 1, 3, 9 epochs. We just use a single
        # bracket, so this is in fact successive halving (Hyperband would use
        # more than 1 bracket).
        # Also note that since we do not use the max_t argument of
        # HyperbandScheduler, this value is obtained from train_fn.args.epochs.
        sch_type = 'stopping' if SCHEDULER == 'hbs' else 'promotion'
        myscheduler = ag.scheduler.HyperbandScheduler(
            run_mlp_openml,
            resource=resources,
            searcher=SEARCHER,
            search_options=search_options,
            time_out=120,
            time_attr=RESOURCE_ATTR_NAME,
            reward_attr=REWARD_ATTR_NAME,
            type=sch_type,
            grace_period=1,
            reduction_factor=3,
            brackets=1)
    
    # run tasks
    myscheduler.run()
    myscheduler.join_jobs()
    
    results_df = process_training_history(
                    myscheduler.training_history.copy(),
                    start_timestamp=myscheduler._start_time)


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-course-v3/venv/lib/python3.7/site-packages/distributed/worker.py:3460: UserWarning: Large object of size 1.30 MB detected in task graph: 
      (0, <function run_mlp_openml at 0x7f9f43420290>, { ... sReporter}, [])
    Consider scattering large objects ahead of time
    with client.scatter to reduce scheduler burden and 
    keep data on workers
    
        future = client.submit(func, big_data)    # bad
    
        big_future = client.scatter(big_data)     # good
        future = client.submit(func, big_future)  # good
      % (format_bytes(len(b)), s)


.. parsed-literal::
    :class: output

    Epoch 1 ; Time: 0.485816 ; Training: accuracy=0.260079 ; Validation: accuracy=0.531250
    Epoch 2 ; Time: 0.925723 ; Training: accuracy=0.496365 ; Validation: accuracy=0.655247
    Epoch 3 ; Time: 1.350109 ; Training: accuracy=0.559650 ; Validation: accuracy=0.694686
    Epoch 4 ; Time: 1.776294 ; Training: accuracy=0.588896 ; Validation: accuracy=0.711063
    Epoch 5 ; Time: 2.202718 ; Training: accuracy=0.609385 ; Validation: accuracy=0.726939
    Epoch 6 ; Time: 2.660790 ; Training: accuracy=0.628139 ; Validation: accuracy=0.745321
    Epoch 7 ; Time: 3.109511 ; Training: accuracy=0.641193 ; Validation: accuracy=0.750501
    Epoch 8 ; Time: 3.538582 ; Training: accuracy=0.653751 ; Validation: accuracy=0.763202
    Epoch 9 ; Time: 3.966724 ; Training: accuracy=0.665482 ; Validation: accuracy=0.766043
    Epoch 1 ; Time: 0.354436 ; Training: accuracy=0.581848 ; Validation: accuracy=0.771103
    Epoch 2 ; Time: 0.704473 ; Training: accuracy=0.715099 ; Validation: accuracy=0.823125
    Epoch 3 ; Time: 0.995774 ; Training: accuracy=0.758663 ; Validation: accuracy=0.852660
    Epoch 4 ; Time: 1.353303 ; Training: accuracy=0.778548 ; Validation: accuracy=0.852828
    Epoch 5 ; Time: 1.653463 ; Training: accuracy=0.782921 ; Validation: accuracy=0.873301
    Epoch 6 ; Time: 1.945009 ; Training: accuracy=0.793977 ; Validation: accuracy=0.876489
    Epoch 7 ; Time: 2.234698 ; Training: accuracy=0.806931 ; Validation: accuracy=0.876825
    Epoch 8 ; Time: 2.547375 ; Training: accuracy=0.811056 ; Validation: accuracy=0.887229
    Epoch 9 ; Time: 2.843662 ; Training: accuracy=0.821782 ; Validation: accuracy=0.893606
    Epoch 1 ; Time: 0.413744 ; Training: accuracy=0.031333 ; Validation: accuracy=0.025862
    Epoch 1 ; Time: 0.474860 ; Training: accuracy=0.039519 ; Validation: accuracy=0.036720
    Epoch 1 ; Time: 0.493186 ; Training: accuracy=0.581720 ; Validation: accuracy=0.754079
    Epoch 2 ; Time: 0.871644 ; Training: accuracy=0.704632 ; Validation: accuracy=0.791708
    Epoch 3 ; Time: 1.246018 ; Training: accuracy=0.731844 ; Validation: accuracy=0.846653
    Epoch 4 ; Time: 1.625319 ; Training: accuracy=0.763689 ; Validation: accuracy=0.855644
    Epoch 5 ; Time: 2.177871 ; Training: accuracy=0.771133 ; Validation: accuracy=0.859640
    Epoch 6 ; Time: 2.560357 ; Training: accuracy=0.786352 ; Validation: accuracy=0.871295
    Epoch 7 ; Time: 2.949618 ; Training: accuracy=0.792556 ; Validation: accuracy=0.879287
    Epoch 8 ; Time: 3.327809 ; Training: accuracy=0.799090 ; Validation: accuracy=0.884782
    Epoch 9 ; Time: 3.777623 ; Training: accuracy=0.802730 ; Validation: accuracy=0.882950
    Epoch 1 ; Time: 0.497789 ; Training: accuracy=0.064532 ; Validation: accuracy=0.121450
    Epoch 1 ; Time: 0.578648 ; Training: accuracy=0.251240 ; Validation: accuracy=0.535881
    Epoch 1 ; Time: 0.503301 ; Training: accuracy=0.413499 ; Validation: accuracy=0.654333
    Epoch 2 ; Time: 0.929561 ; Training: accuracy=0.559917 ; Validation: accuracy=0.725667
    Epoch 3 ; Time: 1.368943 ; Training: accuracy=0.611511 ; Validation: accuracy=0.762000
    Epoch 1 ; Time: 0.767868 ; Training: accuracy=0.246300 ; Validation: accuracy=0.431960
    Epoch 1 ; Time: 0.312346 ; Training: accuracy=0.247434 ; Validation: accuracy=0.537415
    Epoch 1 ; Time: 0.501000 ; Training: accuracy=0.625744 ; Validation: accuracy=0.809308
    Epoch 2 ; Time: 1.085015 ; Training: accuracy=0.757771 ; Validation: accuracy=0.836190
    Epoch 3 ; Time: 1.532385 ; Training: accuracy=0.796048 ; Validation: accuracy=0.846270
    Epoch 1 ; Time: 0.443196 ; Training: accuracy=0.410110 ; Validation: accuracy=0.619753
    Epoch 2 ; Time: 0.825119 ; Training: accuracy=0.494746 ; Validation: accuracy=0.673051
    Epoch 3 ; Time: 1.209963 ; Training: accuracy=0.517664 ; Validation: accuracy=0.674384
    Epoch 1 ; Time: 0.455926 ; Training: accuracy=0.520890 ; Validation: accuracy=0.735177
    Epoch 2 ; Time: 0.826737 ; Training: accuracy=0.694713 ; Validation: accuracy=0.771319
    Epoch 3 ; Time: 1.195614 ; Training: accuracy=0.736576 ; Validation: accuracy=0.830779
    Epoch 1 ; Time: 0.752027 ; Training: accuracy=0.044942 ; Validation: accuracy=0.176768
    Epoch 1 ; Time: 0.383156 ; Training: accuracy=0.638724 ; Validation: accuracy=0.810774
    Epoch 2 ; Time: 0.674688 ; Training: accuracy=0.777282 ; Validation: accuracy=0.842088
    Epoch 3 ; Time: 0.946444 ; Training: accuracy=0.801091 ; Validation: accuracy=0.873064
    Epoch 4 ; Time: 1.219203 ; Training: accuracy=0.815228 ; Validation: accuracy=0.879630
    Epoch 5 ; Time: 1.490317 ; Training: accuracy=0.833003 ; Validation: accuracy=0.888215
    Epoch 6 ; Time: 1.798936 ; Training: accuracy=0.839699 ; Validation: accuracy=0.888384
    Epoch 7 ; Time: 2.072390 ; Training: accuracy=0.847636 ; Validation: accuracy=0.891077
    Epoch 8 ; Time: 2.345393 ; Training: accuracy=0.850694 ; Validation: accuracy=0.903704
    Epoch 9 ; Time: 2.616894 ; Training: accuracy=0.850777 ; Validation: accuracy=0.902525
    Epoch 1 ; Time: 0.322179 ; Training: accuracy=0.270546 ; Validation: accuracy=0.473270
    Epoch 1 ; Time: 0.458280 ; Training: accuracy=0.571879 ; Validation: accuracy=0.768435
    Epoch 2 ; Time: 0.855828 ; Training: accuracy=0.775576 ; Validation: accuracy=0.840174
    Epoch 3 ; Time: 1.236585 ; Training: accuracy=0.823081 ; Validation: accuracy=0.850017
    Epoch 4 ; Time: 1.628630 ; Training: accuracy=0.848284 ; Validation: accuracy=0.888055
    Epoch 5 ; Time: 2.045398 ; Training: accuracy=0.869176 ; Validation: accuracy=0.899733
    Epoch 6 ; Time: 2.436024 ; Training: accuracy=0.879042 ; Validation: accuracy=0.897064
    Epoch 7 ; Time: 2.827001 ; Training: accuracy=0.886089 ; Validation: accuracy=0.908075
    Epoch 8 ; Time: 3.220183 ; Training: accuracy=0.897778 ; Validation: accuracy=0.920754
    Epoch 9 ; Time: 3.611714 ; Training: accuracy=0.903250 ; Validation: accuracy=0.917251
    Epoch 1 ; Time: 0.577712 ; Training: accuracy=0.575012 ; Validation: accuracy=0.773837
    Epoch 2 ; Time: 1.045443 ; Training: accuracy=0.744908 ; Validation: accuracy=0.825192
    Epoch 3 ; Time: 1.512438 ; Training: accuracy=0.794006 ; Validation: accuracy=0.861659
    Epoch 4 ; Time: 1.989012 ; Training: accuracy=0.817685 ; Validation: accuracy=0.871863
    Epoch 5 ; Time: 2.456224 ; Training: accuracy=0.833582 ; Validation: accuracy=0.887755
    Epoch 6 ; Time: 2.924245 ; Training: accuracy=0.846829 ; Validation: accuracy=0.893610
    Epoch 7 ; Time: 3.430988 ; Training: accuracy=0.851714 ; Validation: accuracy=0.905320
    Epoch 8 ; Time: 3.918977 ; Training: accuracy=0.863719 ; Validation: accuracy=0.906992
    Epoch 9 ; Time: 4.414324 ; Training: accuracy=0.868356 ; Validation: accuracy=0.911676
    Epoch 1 ; Time: 0.944247 ; Training: accuracy=0.433936 ; Validation: accuracy=0.641801
    Epoch 1 ; Time: 0.363867 ; Training: accuracy=0.642586 ; Validation: accuracy=0.789894
    Epoch 2 ; Time: 0.672631 ; Training: accuracy=0.778822 ; Validation: accuracy=0.832114
    Epoch 3 ; Time: 1.031989 ; Training: accuracy=0.800346 ; Validation: accuracy=0.862367
    Epoch 4 ; Time: 1.362332 ; Training: accuracy=0.826159 ; Validation: accuracy=0.875332
    Epoch 5 ; Time: 1.667801 ; Training: accuracy=0.836302 ; Validation: accuracy=0.875332
    Epoch 6 ; Time: 1.976715 ; Training: accuracy=0.844054 ; Validation: accuracy=0.880319
    Epoch 7 ; Time: 2.283057 ; Training: accuracy=0.852713 ; Validation: accuracy=0.877660
    Epoch 8 ; Time: 2.584614 ; Training: accuracy=0.863434 ; Validation: accuracy=0.896443
    Epoch 9 ; Time: 2.890280 ; Training: accuracy=0.865413 ; Validation: accuracy=0.885140
    Epoch 1 ; Time: 0.443991 ; Training: accuracy=0.388039 ; Validation: accuracy=0.691309
    Epoch 1 ; Time: 0.426861 ; Training: accuracy=0.672291 ; Validation: accuracy=0.813686
    Epoch 2 ; Time: 0.809158 ; Training: accuracy=0.850455 ; Validation: accuracy=0.868465
    Epoch 3 ; Time: 1.211733 ; Training: accuracy=0.881390 ; Validation: accuracy=0.906593
    Epoch 4 ; Time: 1.588458 ; Training: accuracy=0.910008 ; Validation: accuracy=0.912088
    Epoch 5 ; Time: 1.955405 ; Training: accuracy=0.922663 ; Validation: accuracy=0.915751
    Epoch 6 ; Time: 2.318612 ; Training: accuracy=0.934243 ; Validation: accuracy=0.930569
    Epoch 7 ; Time: 2.721848 ; Training: accuracy=0.940612 ; Validation: accuracy=0.927739
    Epoch 8 ; Time: 3.162516 ; Training: accuracy=0.946402 ; Validation: accuracy=0.934565
    Epoch 9 ; Time: 3.550003 ; Training: accuracy=0.951613 ; Validation: accuracy=0.929071
    Epoch 1 ; Time: 0.368529 ; Training: accuracy=0.406714 ; Validation: accuracy=0.661320
    Epoch 1 ; Time: 0.398520 ; Training: accuracy=0.464457 ; Validation: accuracy=0.682689
    Epoch 1 ; Time: 0.384942 ; Training: accuracy=0.457259 ; Validation: accuracy=0.646043
    Epoch 1 ; Time: 0.409404 ; Training: accuracy=0.383181 ; Validation: accuracy=0.619832
    Epoch 1 ; Time: 0.538824 ; Training: accuracy=0.470978 ; Validation: accuracy=0.714141
    Epoch 1 ; Time: 0.396265 ; Training: accuracy=0.530034 ; Validation: accuracy=0.764236
    Epoch 2 ; Time: 0.722034 ; Training: accuracy=0.728001 ; Validation: accuracy=0.828505
    Epoch 3 ; Time: 1.087246 ; Training: accuracy=0.786003 ; Validation: accuracy=0.851315
    Epoch 1 ; Time: 0.471924 ; Training: accuracy=0.548577 ; Validation: accuracy=0.744837
    Epoch 2 ; Time: 0.856599 ; Training: accuracy=0.753062 ; Validation: accuracy=0.805796
    Epoch 3 ; Time: 1.233011 ; Training: accuracy=0.806687 ; Validation: accuracy=0.845103
    Epoch 1 ; Time: 0.403902 ; Training: accuracy=0.573500 ; Validation: accuracy=0.757267
    Epoch 2 ; Time: 0.741710 ; Training: accuracy=0.800923 ; Validation: accuracy=0.834447
    Epoch 3 ; Time: 1.113848 ; Training: accuracy=0.858355 ; Validation: accuracy=0.868527
    Epoch 4 ; Time: 1.449182 ; Training: accuracy=0.887195 ; Validation: accuracy=0.884564
    Epoch 5 ; Time: 1.848668 ; Training: accuracy=0.911503 ; Validation: accuracy=0.897093
    Epoch 6 ; Time: 2.207992 ; Training: accuracy=0.919084 ; Validation: accuracy=0.909121
    Epoch 7 ; Time: 2.547297 ; Training: accuracy=0.934410 ; Validation: accuracy=0.924658
    Epoch 8 ; Time: 2.920436 ; Training: accuracy=0.943309 ; Validation: accuracy=0.918476
    Epoch 9 ; Time: 3.282759 ; Training: accuracy=0.947017 ; Validation: accuracy=0.929669
    Epoch 1 ; Time: 0.391132 ; Training: accuracy=0.611194 ; Validation: accuracy=0.761953
    Epoch 2 ; Time: 0.782814 ; Training: accuracy=0.803980 ; Validation: accuracy=0.840236
    Epoch 3 ; Time: 1.159332 ; Training: accuracy=0.854892 ; Validation: accuracy=0.875926
    Epoch 4 ; Time: 1.490009 ; Training: accuracy=0.882338 ; Validation: accuracy=0.902694
    Epoch 5 ; Time: 1.801840 ; Training: accuracy=0.906302 ; Validation: accuracy=0.920370
    Epoch 6 ; Time: 2.220920 ; Training: accuracy=0.915837 ; Validation: accuracy=0.924916
    Epoch 7 ; Time: 2.595611 ; Training: accuracy=0.928275 ; Validation: accuracy=0.926936
    Epoch 8 ; Time: 2.921697 ; Training: accuracy=0.932172 ; Validation: accuracy=0.926936
    Epoch 9 ; Time: 3.257192 ; Training: accuracy=0.937977 ; Validation: accuracy=0.937374
    Epoch 1 ; Time: 0.406384 ; Training: accuracy=0.675000 ; Validation: accuracy=0.811000
    Epoch 2 ; Time: 0.753011 ; Training: accuracy=0.850414 ; Validation: accuracy=0.872000
    Epoch 3 ; Time: 1.104501 ; Training: accuracy=0.890646 ; Validation: accuracy=0.904833
    Epoch 4 ; Time: 1.468058 ; Training: accuracy=0.915480 ; Validation: accuracy=0.909667
    Epoch 5 ; Time: 1.811414 ; Training: accuracy=0.923013 ; Validation: accuracy=0.906833
    Epoch 6 ; Time: 2.160280 ; Training: accuracy=0.939735 ; Validation: accuracy=0.913333
    Epoch 7 ; Time: 2.512255 ; Training: accuracy=0.949917 ; Validation: accuracy=0.931333
    Epoch 8 ; Time: 2.872170 ; Training: accuracy=0.953974 ; Validation: accuracy=0.933500
    Epoch 9 ; Time: 3.218282 ; Training: accuracy=0.957533 ; Validation: accuracy=0.925833
    Epoch 1 ; Time: 0.319053 ; Training: accuracy=0.709950 ; Validation: accuracy=0.844292
    Epoch 2 ; Time: 0.589575 ; Training: accuracy=0.870813 ; Validation: accuracy=0.875334
    Epoch 3 ; Time: 0.987917 ; Training: accuracy=0.902324 ; Validation: accuracy=0.904039
    Epoch 4 ; Time: 1.249979 ; Training: accuracy=0.924241 ; Validation: accuracy=0.905040
    Epoch 5 ; Time: 1.531832 ; Training: accuracy=0.926226 ; Validation: accuracy=0.908378
    Epoch 6 ; Time: 1.788824 ; Training: accuracy=0.940121 ; Validation: accuracy=0.922230
    Epoch 7 ; Time: 2.067610 ; Training: accuracy=0.948639 ; Validation: accuracy=0.923231
    Epoch 8 ; Time: 2.327184 ; Training: accuracy=0.950955 ; Validation: accuracy=0.926569
    Epoch 9 ; Time: 2.611477 ; Training: accuracy=0.957820 ; Validation: accuracy=0.933745
    Epoch 1 ; Time: 0.419348 ; Training: accuracy=0.633882 ; Validation: accuracy=0.791375
    Epoch 2 ; Time: 0.785120 ; Training: accuracy=0.822649 ; Validation: accuracy=0.857642
    Epoch 3 ; Time: 1.140644 ; Training: accuracy=0.874018 ; Validation: accuracy=0.890276
    Epoch 4 ; Time: 1.496016 ; Training: accuracy=0.904459 ; Validation: accuracy=0.895604
    Epoch 5 ; Time: 1.856606 ; Training: accuracy=0.918107 ; Validation: accuracy=0.901265
    Epoch 6 ; Time: 2.219018 ; Training: accuracy=0.932252 ; Validation: accuracy=0.921911
    Epoch 7 ; Time: 2.587913 ; Training: accuracy=0.936637 ; Validation: accuracy=0.926906
    Epoch 8 ; Time: 2.995049 ; Training: accuracy=0.948135 ; Validation: accuracy=0.930736
    Epoch 9 ; Time: 3.366530 ; Training: accuracy=0.951195 ; Validation: accuracy=0.932900
    Epoch 1 ; Time: 0.362235 ; Training: accuracy=0.717879 ; Validation: accuracy=0.818816
    Epoch 2 ; Time: 0.717216 ; Training: accuracy=0.838941 ; Validation: accuracy=0.847407
    Epoch 3 ; Time: 1.011797 ; Training: accuracy=0.877288 ; Validation: accuracy=0.877992
    Epoch 4 ; Time: 1.307802 ; Training: accuracy=0.891638 ; Validation: accuracy=0.883477
    Epoch 5 ; Time: 1.607926 ; Training: accuracy=0.901204 ; Validation: accuracy=0.879322
    Epoch 6 ; Time: 1.917984 ; Training: accuracy=0.905162 ; Validation: accuracy=0.870013
    Epoch 7 ; Time: 2.220767 ; Training: accuracy=0.917038 ; Validation: accuracy=0.909408
    Epoch 8 ; Time: 2.541039 ; Training: accuracy=0.925697 ; Validation: accuracy=0.898770
    Epoch 9 ; Time: 2.893164 ; Training: accuracy=0.922233 ; Validation: accuracy=0.911735
    Epoch 1 ; Time: 0.661581 ; Training: accuracy=0.565527 ; Validation: accuracy=0.718211
    Epoch 1 ; Time: 0.540891 ; Training: accuracy=0.635220 ; Validation: accuracy=0.782456
    Epoch 2 ; Time: 1.021876 ; Training: accuracy=0.824065 ; Validation: accuracy=0.852632
    Epoch 3 ; Time: 1.497699 ; Training: accuracy=0.874710 ; Validation: accuracy=0.872180
    Epoch 4 ; Time: 1.987899 ; Training: accuracy=0.899123 ; Validation: accuracy=0.894069
    Epoch 5 ; Time: 2.488709 ; Training: accuracy=0.916005 ; Validation: accuracy=0.910443
    Epoch 6 ; Time: 2.983786 ; Training: accuracy=0.927756 ; Validation: accuracy=0.905263
    Epoch 7 ; Time: 3.506227 ; Training: accuracy=0.937272 ; Validation: accuracy=0.917794
    Epoch 8 ; Time: 4.000621 ; Training: accuracy=0.945382 ; Validation: accuracy=0.928488
    Epoch 9 ; Time: 4.512321 ; Training: accuracy=0.950844 ; Validation: accuracy=0.929156


Analysing the results
~~~~~~~~~~~~~~~~~~~~~

The training history is stored in the ``results_df``, the main fields
are the runtime and ``'best'`` (the objective).

**Note**: You will get slightly different curves for different pairs of
scheduler/searcher, the ``time_out`` here is a bit too short to really
see the difference in a significant way (it would be better to set it to
>1000s). Generally speaking though, hyperband stopping / promotion +
model will tend to significantly outperform other combinations given
enough time.

.. code:: python

    results_df.head()


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>bracket</th>
          <th>elapsed_time</th>
          <th>epoch</th>
          <th>error</th>
          <th>eval_time</th>
          <th>objective</th>
          <th>runtime</th>
          <th>searcher_data_size</th>
          <th>searcher_params_kernel_covariance_scale</th>
          <th>searcher_params_kernel_inv_bw0</th>
          <th>...</th>
          <th>searcher_params_kernel_inv_bw7</th>
          <th>searcher_params_kernel_inv_bw8</th>
          <th>searcher_params_mean_mean_value</th>
          <th>searcher_params_noise_variance</th>
          <th>target_epoch</th>
          <th>task_id</th>
          <th>time_since_start</th>
          <th>time_step</th>
          <th>time_this_iter</th>
          <th>best</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0</td>
          <td>0.488541</td>
          <td>1</td>
          <td>0.468750</td>
          <td>0.483711</td>
          <td>0.531250</td>
          <td>1.571379</td>
          <td>NaN</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>...</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>0.0</td>
          <td>0.001</td>
          <td>9</td>
          <td>0</td>
          <td>1.573284</td>
          <td>1.614630e+09</td>
          <td>0.517057</td>
          <td>0.468750</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>0.927435</td>
          <td>2</td>
          <td>0.344753</td>
          <td>0.434316</td>
          <td>0.655247</td>
          <td>2.010273</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>...</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>0.0</td>
          <td>0.001</td>
          <td>9</td>
          <td>0</td>
          <td>2.011093</td>
          <td>1.614630e+09</td>
          <td>0.438870</td>
          <td>0.344753</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0</td>
          <td>1.351925</td>
          <td>3</td>
          <td>0.305314</td>
          <td>0.422602</td>
          <td>0.694686</td>
          <td>2.434762</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>...</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>0.0</td>
          <td>0.001</td>
          <td>9</td>
          <td>0</td>
          <td>2.435731</td>
          <td>1.614630e+09</td>
          <td>0.424491</td>
          <td>0.305314</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0</td>
          <td>1.777915</td>
          <td>4</td>
          <td>0.288937</td>
          <td>0.423248</td>
          <td>0.711063</td>
          <td>2.860753</td>
          <td>2.0</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>...</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>0.0</td>
          <td>0.001</td>
          <td>9</td>
          <td>0</td>
          <td>2.861669</td>
          <td>1.614630e+09</td>
          <td>0.425990</td>
          <td>0.288937</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0</td>
          <td>2.204442</td>
          <td>5</td>
          <td>0.273061</td>
          <td>0.424562</td>
          <td>0.726939</td>
          <td>3.287279</td>
          <td>2.0</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>...</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>0.0</td>
          <td>0.001</td>
          <td>9</td>
          <td>0</td>
          <td>3.288359</td>
          <td>1.614630e+09</td>
          <td>0.426525</td>
          <td>0.273061</td>
        </tr>
      </tbody>
    </table>
    <p>5 rows × 26 columns</p>
    </div>


.. code:: python

    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(12, 8))
    
    runtime = results_df['runtime'].values
    objective = results_df['best'].values
    
    plt.plot(runtime, objective, lw=2)
    plt.xticks(fontsize=12)
    plt.xlim(0, 120)
    plt.ylim(0, 0.5)
    plt.yticks(fontsize=12)
    plt.xlabel("Runtime [s]", fontsize=14)
    plt.ylabel("Objective", fontsize=14)


.. parsed-literal::
    :class: output

    Text(0, 0.5, 'Objective')


Diving Deeper
-------------

Now, you are ready to try HPO on your own machine learning models (if
you use PyTorch, have a look at :ref:`sec_customstorch`). While
AutoGluon comes with well-chosen defaults, it can pay off to tune it to
your specific needs. Here are some tips which may come useful.

Logging the Search Progress
~~~~~~~~~~~~~~~~~~~~~~~~~~~

First, it is a good idea in general to switch on ``debug_log``, which
outputs useful information about the search progress. This is already
done in the example above.

The outputs show which configurations are chosen, stopped, or promoted.
For BO and BOHB, a range of information is displayed for every
``get_config`` decision. This log output is very useful in order to
figure out what is going on during the search.

Configuring ``HyperbandScheduler``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most important knobs to turn with ``HyperbandScheduler`` are
``max_t``, ``grace_period``, ``reduction_factor``, ``brackets``, and
``type``. The first three determine the rung levels at which stopping or
promotion decisions are being made.

-  The maximum resource level ``max_t`` (usually, resource equates to
   epochs, so ``max_t`` is the maximum number of training epochs) is
   typically hardcoded in ``train_fn`` passed to the scheduler (this is
   ``run_mlp_openml`` in the example above). As already noted above, the
   value is best fixed in the ``ag.args`` decorator as ``epochs=XYZ``,
   it can then be accessed as ``args.epochs`` in the ``train_fn`` code.
   If this is done, you do not have to pass ``max_t`` when creating the
   scheduler.
-  ``grace_period`` and ``reduction_factor`` determine the rung levels,
   which are ``grace_period``, ``grace_period * reduction_factor``,
   ``grace_period * (reduction_factor ** 2)``, etc. All rung levels must
   be less or equal than ``max_t``. It is recommended to make ``max_t``
   equal to the largest rung level. For example, if
   ``grace_period = 1``, ``reduction_factor = 3``, it is in general
   recommended to use ``max_t = 9``, ``max_t = 27``, or ``max_t = 81``.
   Choosing a ``max_t`` value "off the grid" works against the
   successive halving principle that the total resources spent in a rung
   should be roughly equal between rungs. If in the example above, you
   set ``max_t = 10``, about a third of configurations reaching 9 epochs
   are allowed to proceed, but only for one more epoch.
-  With ``reduction_factor``, you tune the extent to which successive
   halving filtering is applied. The larger this integer, the fewer
   configurations make it to higher number of epochs. Values 2, 3, 4 are
   commonly used.
-  Finally, ``grace_period`` should be set to the smallest resource
   (number of epochs) for which you expect any meaningful
   differentiation between configurations. While ``grace_period = 1``
   should always be explored, it may be too low for any meaningful
   stopping decisions to be made at the first rung.
-  ``brackets`` sets the maximum number of brackets in Hyperband (make
   sure to study the Hyperband paper or follow-ups for details). For
   ``brackets = 1``, you are running successive halving (single
   bracket). Higher brackets have larger effective ``grace_period``
   values (so runs are not stopped until later), yet are also chosen
   with less probability. We recommend to always consider successive
   halving (``brackets = 1``) in a comparison.
-  Finally, with ``type`` (values ``stopping``, ``promotion``) you are
   choosing different ways of extending successive halving scheduling to
   the asynchronous case. The method for the default ``stopping`` is
   simpler and seems to perform well, but ``promotion`` is more careful
   promoting configurations to higher resource levels, which can work
   better in some cases.

Asynchronous BOHB
~~~~~~~~~~~~~~~~~

Finally, here are some ideas for tuning asynchronous BOHB, apart from
tuning its ``HyperbandScheduling`` component. You need to pass these
options in ``search_options``.

-  We support a range of different surrogate models over the criterion
   functions across resource levels. All of them are jointly dependent
   Gaussian process models, meaning that data collected at all resource
   levels are modelled together. The surrogate model is selected by
   ``gp_resource_kernel``, values are ``matern52``,
   ``matern52-res-warp``, ``exp-decay-sum``, ``exp-decay-combined``,
   ``exp-decay-delta1``. These are variants of either a joint Matern 5/2
   kernel over configuration and resource, or the exponential decay
   model. Details about the latter can be found
   `here <https://arxiv.org/abs/2003.10865>`__.
-  Fitting a Gaussian process surrogate model to data encurs a cost
   which scales cubically with the number of datapoints. When applied to
   expensive deep learning workloads, even multi-fidelity asynchronous
   BOHB is rarely running up more than 100 observations or so (across
   all rung levels and brackets), and the GP computations are
   subdominant. However, if you apply it to cheaper ``train_fn`` and
   find yourself beyond 2000 total evaluations, the cost of GP fitting
   can become painful. In such a situation, you can explore the options
   ``opt_skip_period`` and ``opt_skip_num_max_resource``. The basic idea
   is as follows. By far the most expensive part of a ``get_config``
   call (picking the next configuration) is the refitting of the GP
   model to past data (this entails re-optimizing hyperparameters of the
   surrogate model itself). The options allow you to skip this expensive
   step for most ``get_config`` calls, after some initial period. Check
   the docstrings for details about these options. If you find yourself
   in such a situation and gain experience with these skipping features,
   make sure to contact the AutoGluon developers -- we would love to
   learn about your use case.