.. _sec_custom_advancedhpo: Getting started with Advanced HPO Algorithms ============================================ This tutorial provides a complete example of how to use AutoGluon's state-of-the-art hyperparameter optimization (HPO) algorithms to tune a basic Multi-Layer Perceptron (MLP) model, which is the most basic type of neural network. Loading libraries ----------------- .. code:: python # Basic utils for folder manipulations etc import time import multiprocessing # to count the number of CPUs available # External tools to load and process data import numpy as np import pandas as pd # MXNet (NeuralNets) import mxnet as mx from mxnet import gluon, autograd from mxnet.gluon import nn # AutoGluon and HPO tools import autogluon.core as ag from autogluon.mxnet.utils import load_and_split_openml_data Check the version of MxNet, you should be fine with version >= 1.5 .. code:: python mx.__version__ .. parsed-literal:: :class: output '1.7.0' You can also check the version of AutoGluon and the specific commit and check that it matches what you want. .. code:: python import autogluon.core.version ag.version.__version__ .. parsed-literal:: :class: output '0.1.0b20210301' Hyperparameter Optimization of a 2-layer MLP -------------------------------------------- Setting up the context ~~~~~~~~~~~~~~~~~~~~~~ Here we declare a few "environment variables" setting the context for what we're doing .. code:: python OPENML_TASK_ID = 6 # describes the problem we will tackle RATIO_TRAIN_VALID = 0.33 # split of the training data used for validation RESOURCE_ATTR_NAME = 'epoch' # how do we measure resources (will become clearer further) REWARD_ATTR_NAME = 'objective' # how do we measure performance (will become clearer further) NUM_CPUS = multiprocessing.cpu_count() Preparing the data ~~~~~~~~~~~~~~~~~~ We will use a multi-way classification task from OpenML. Data preparation includes: - Missing values are imputed, using the 'mean' strategy of ``sklearn.impute.SimpleImputer`` - Split training set into training and validation - Standardize inputs to mean 0, variance 1 .. code:: python X_train, X_valid, y_train, y_valid, n_classes = load_and_split_openml_data( OPENML_TASK_ID, RATIO_TRAIN_VALID, download_from_openml=False) n_classes .. parsed-literal:: :class: output 100%|██████████| 704/704 [00:00<00:00, 41137.24KB/s] 100%|██████████| 2521/2521 [00:00<00:00, 30052.13KB/s] 3KB [00:00, 3548.48KB/s] 8KB [00:00, 10233.13KB/s] 15KB [00:00, 15553.66KB/s] 2998KB [00:00, 31984.93KB/s] 881KB [00:00, 38288.07KB/s] 3KB [00:00, 3992.04KB/s] .. parsed-literal:: :class: output 26 The problem has 26 classes. Declaring a model specifying a hyperparameter space with AutoGluon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two layer MLP where we optimize over: - the number of units on the first layer - the number of units on the second layer - the dropout rate after each layer - the learning rate - the scaling - the ``@ag.args`` decorator allows us to specify the space we will optimize over, this matches the `ConfigSpace `__ syntax The body of the function ``run_mlp_openml`` is pretty simple: - it reads the hyperparameters given via the decorator - it defines a 2 layer MLP with dropout - it declares a trainer with the 'adam' loss function and a provided learning rate - it trains the NN with a number of epochs (most of that is boilerplate code from ``mxnet``) - the ``reporter`` at the end is used to keep track of training history in the hyperparameter optimization **Note**: The number of epochs and the hyperparameter space are reduced to make for a shorter experiment .. code:: python @ag.args(n_units_1=ag.space.Int(lower=16, upper=128), n_units_2=ag.space.Int(lower=16, upper=128), dropout_1=ag.space.Real(lower=0, upper=.75), dropout_2=ag.space.Real(lower=0, upper=.75), learning_rate=ag.space.Real(lower=1e-6, upper=1, log=True), batch_size=ag.space.Int(lower=8, upper=128), scale_1=ag.space.Real(lower=0.001, upper=10, log=True), scale_2=ag.space.Real(lower=0.001, upper=10, log=True), epochs=9) def run_mlp_openml(args, reporter, **kwargs): # Time stamp for elapsed_time ts_start = time.time() # Unwrap hyperparameters n_units_1 = args.n_units_1 n_units_2 = args.n_units_2 dropout_1 = args.dropout_1 dropout_2 = args.dropout_2 scale_1 = args.scale_1 scale_2 = args.scale_2 batch_size = args.batch_size learning_rate = args.learning_rate ctx = mx.cpu() net = nn.Sequential() with net.name_scope(): # Layer 1 net.add(nn.Dense(n_units_1, activation='relu', weight_initializer=mx.initializer.Uniform(scale=scale_1))) # Dropout net.add(gluon.nn.Dropout(dropout_1)) # Layer 2 net.add(nn.Dense(n_units_2, activation='relu', weight_initializer=mx.initializer.Uniform(scale=scale_2))) # Dropout net.add(gluon.nn.Dropout(dropout_2)) # Output net.add(nn.Dense(n_classes)) net.initialize(ctx=ctx) trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': learning_rate}) for epoch in range(args.epochs): ts_epoch = time.time() train_iter = mx.io.NDArrayIter( data={'data': X_train}, label={'label': y_train}, batch_size=batch_size, shuffle=True) valid_iter = mx.io.NDArrayIter( data={'data': X_valid}, label={'label': y_valid}, batch_size=batch_size, shuffle=False) metric = mx.metric.Accuracy() loss = gluon.loss.SoftmaxCrossEntropyLoss() for batch in train_iter: data = batch.data[0].as_in_context(ctx) label = batch.label[0].as_in_context(ctx) with autograd.record(): output = net(data) L = loss(output, label) L.backward() trainer.step(data.shape[0]) metric.update([label], [output]) name, train_acc = metric.get() metric = mx.metric.Accuracy() for batch in valid_iter: data = batch.data[0].as_in_context(ctx) label = batch.label[0].as_in_context(ctx) output = net(data) metric.update([label], [output]) name, val_acc = metric.get() print('Epoch %d ; Time: %f ; Training: %s=%f ; Validation: %s=%f' % ( epoch + 1, time.time() - ts_start, name, train_acc, name, val_acc)) ts_now = time.time() eval_time = ts_now - ts_epoch elapsed_time = ts_now - ts_start # The resource reported back (as 'epoch') is the number of epochs # done, starting at 1 reporter( epoch=epoch + 1, objective=float(val_acc), eval_time=eval_time, time_step=ts_now, elapsed_time=elapsed_time) **Note**: The annotation ``epochs=9`` specifies the maximum number of epochs for training. It becomes available as ``args.epochs``. Importantly, it is also processed by ``HyperbandScheduler`` below in order to set its ``max_t`` attribute. **Recommendation**: Whenever writing training code to be passed as ``train_fn`` to a scheduler, if this training code reports a resource (or time) attribute, the corresponding maximum resource value should be included in ``train_fn.args``: - If the resource attribute (``time_attr`` of scheduler) in ``train_fn`` is ``epoch``, make sure to include ``epochs=XYZ`` in the annotation. This allows the scheduler to read ``max_t`` from ``train_fn.args.epochs``. This case corresponds to our example here. - If the resource attribute is something else than ``epoch``, you can also include the annotation ``max_t=XYZ``, which allows the scheduler to read ``max_t`` from ``train_fn.args.max_t``. Annotating the training function by the correct value for ``max_t`` simplifies scheduler creation (since ``max_t`` does not have to be passed), and avoids inconsistencies between ``train_fn`` and the scheduler. Running the Hyperparameter Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can use the following schedulers: - FIFO (``fifo``) - Hyperband (either the stopping (``hbs``) or promotion (``hbp``) variant) And the following searchers: - Random search (``random``) - Gaussian process based Bayesian optimization (``bayesopt``) - SkOpt Bayesian optimization (``skopt``; only with FIFO scheduler) Note that the method known as (asynchronous) Hyperband is using random search. Combining Hyperband scheduling with the ``bayesopt`` searcher uses a novel method called asynchronous BOHB. Pick the combination you're interested in (doing the full experiment takes around 120 seconds, see the ``time_out`` parameter), running everything with multiple runs can take a fair bit of time. In real life, you will want to choose a larger ``time_out`` in order to obtain good performance. .. code:: python SCHEDULER = "hbs" SEARCHER = "bayesopt" .. code:: python def compute_error(df): return 1.0 - df["objective"] def compute_runtime(df, start_timestamp): return df["time_step"] - start_timestamp def process_training_history(task_dicts, start_timestamp, runtime_fn=compute_runtime, error_fn=compute_error): task_dfs = [] for task_id in task_dicts: task_df = pd.DataFrame(task_dicts[task_id]) task_df = task_df.assign(task_id=task_id, runtime=runtime_fn(task_df, start_timestamp), error=error_fn(task_df), target_epoch=task_df["epoch"].iloc[-1]) task_dfs.append(task_df) result = pd.concat(task_dfs, axis="index", ignore_index=True, sort=True) # re-order by runtime result = result.sort_values(by="runtime") # calculate incumbent best -- the cumulative minimum of the error. result = result.assign(best=result["error"].cummin()) return result resources = dict(num_cpus=NUM_CPUS, num_gpus=0) .. code:: python search_options = { 'num_init_random': 2, 'debug_log': True} if SCHEDULER == 'fifo': myscheduler = ag.scheduler.FIFOScheduler( run_mlp_openml, resource=resources, searcher=SEARCHER, search_options=search_options, time_out=120, time_attr=RESOURCE_ATTR_NAME, reward_attr=REWARD_ATTR_NAME) else: # This setup uses rung levels at 1, 3, 9 epochs. We just use a single # bracket, so this is in fact successive halving (Hyperband would use # more than 1 bracket). # Also note that since we do not use the max_t argument of # HyperbandScheduler, this value is obtained from train_fn.args.epochs. sch_type = 'stopping' if SCHEDULER == 'hbs' else 'promotion' myscheduler = ag.scheduler.HyperbandScheduler( run_mlp_openml, resource=resources, searcher=SEARCHER, search_options=search_options, time_out=120, time_attr=RESOURCE_ATTR_NAME, reward_attr=REWARD_ATTR_NAME, type=sch_type, grace_period=1, reduction_factor=3, brackets=1) # run tasks myscheduler.run() myscheduler.join_jobs() results_df = process_training_history( myscheduler.training_history.copy(), start_timestamp=myscheduler._start_time) .. parsed-literal:: :class: output /var/lib/jenkins/workspace/workspace/autogluon-tutorial-course-v3/venv/lib/python3.7/site-packages/distributed/worker.py:3460: UserWarning: Large object of size 1.30 MB detected in task graph: (0, , { ... sReporter}, []) Consider scattering large objects ahead of time with client.scatter to reduce scheduler burden and keep data on workers future = client.submit(func, big_data) # bad big_future = client.scatter(big_data) # good future = client.submit(func, big_future) # good % (format_bytes(len(b)), s) .. parsed-literal:: :class: output Epoch 1 ; Time: 0.485816 ; Training: accuracy=0.260079 ; Validation: accuracy=0.531250 Epoch 2 ; Time: 0.925723 ; Training: accuracy=0.496365 ; Validation: accuracy=0.655247 Epoch 3 ; Time: 1.350109 ; Training: accuracy=0.559650 ; Validation: accuracy=0.694686 Epoch 4 ; Time: 1.776294 ; Training: accuracy=0.588896 ; Validation: accuracy=0.711063 Epoch 5 ; Time: 2.202718 ; Training: accuracy=0.609385 ; Validation: accuracy=0.726939 Epoch 6 ; Time: 2.660790 ; Training: accuracy=0.628139 ; Validation: accuracy=0.745321 Epoch 7 ; Time: 3.109511 ; Training: accuracy=0.641193 ; Validation: accuracy=0.750501 Epoch 8 ; Time: 3.538582 ; Training: accuracy=0.653751 ; Validation: accuracy=0.763202 Epoch 9 ; Time: 3.966724 ; Training: accuracy=0.665482 ; Validation: accuracy=0.766043 Epoch 1 ; Time: 0.354436 ; Training: accuracy=0.581848 ; Validation: accuracy=0.771103 Epoch 2 ; Time: 0.704473 ; Training: accuracy=0.715099 ; Validation: accuracy=0.823125 Epoch 3 ; Time: 0.995774 ; Training: accuracy=0.758663 ; Validation: accuracy=0.852660 Epoch 4 ; Time: 1.353303 ; Training: accuracy=0.778548 ; Validation: accuracy=0.852828 Epoch 5 ; Time: 1.653463 ; Training: accuracy=0.782921 ; Validation: accuracy=0.873301 Epoch 6 ; Time: 1.945009 ; Training: accuracy=0.793977 ; Validation: accuracy=0.876489 Epoch 7 ; Time: 2.234698 ; Training: accuracy=0.806931 ; Validation: accuracy=0.876825 Epoch 8 ; Time: 2.547375 ; Training: accuracy=0.811056 ; Validation: accuracy=0.887229 Epoch 9 ; Time: 2.843662 ; Training: accuracy=0.821782 ; Validation: accuracy=0.893606 Epoch 1 ; Time: 0.413744 ; Training: accuracy=0.031333 ; Validation: accuracy=0.025862 Epoch 1 ; Time: 0.474860 ; Training: accuracy=0.039519 ; Validation: accuracy=0.036720 Epoch 1 ; Time: 0.493186 ; Training: accuracy=0.581720 ; Validation: accuracy=0.754079 Epoch 2 ; Time: 0.871644 ; Training: accuracy=0.704632 ; Validation: accuracy=0.791708 Epoch 3 ; Time: 1.246018 ; Training: accuracy=0.731844 ; Validation: accuracy=0.846653 Epoch 4 ; Time: 1.625319 ; Training: accuracy=0.763689 ; Validation: accuracy=0.855644 Epoch 5 ; Time: 2.177871 ; Training: accuracy=0.771133 ; Validation: accuracy=0.859640 Epoch 6 ; Time: 2.560357 ; Training: accuracy=0.786352 ; Validation: accuracy=0.871295 Epoch 7 ; Time: 2.949618 ; Training: accuracy=0.792556 ; Validation: accuracy=0.879287 Epoch 8 ; Time: 3.327809 ; Training: accuracy=0.799090 ; Validation: accuracy=0.884782 Epoch 9 ; Time: 3.777623 ; Training: accuracy=0.802730 ; Validation: accuracy=0.882950 Epoch 1 ; Time: 0.497789 ; Training: accuracy=0.064532 ; Validation: accuracy=0.121450 Epoch 1 ; Time: 0.578648 ; Training: accuracy=0.251240 ; Validation: accuracy=0.535881 Epoch 1 ; Time: 0.503301 ; Training: accuracy=0.413499 ; Validation: accuracy=0.654333 Epoch 2 ; Time: 0.929561 ; Training: accuracy=0.559917 ; Validation: accuracy=0.725667 Epoch 3 ; Time: 1.368943 ; Training: accuracy=0.611511 ; Validation: accuracy=0.762000 Epoch 1 ; Time: 0.767868 ; Training: accuracy=0.246300 ; Validation: accuracy=0.431960 Epoch 1 ; Time: 0.312346 ; Training: accuracy=0.247434 ; Validation: accuracy=0.537415 Epoch 1 ; Time: 0.501000 ; Training: accuracy=0.625744 ; Validation: accuracy=0.809308 Epoch 2 ; Time: 1.085015 ; Training: accuracy=0.757771 ; Validation: accuracy=0.836190 Epoch 3 ; Time: 1.532385 ; Training: accuracy=0.796048 ; Validation: accuracy=0.846270 Epoch 1 ; Time: 0.443196 ; Training: accuracy=0.410110 ; Validation: accuracy=0.619753 Epoch 2 ; Time: 0.825119 ; Training: accuracy=0.494746 ; Validation: accuracy=0.673051 Epoch 3 ; Time: 1.209963 ; Training: accuracy=0.517664 ; Validation: accuracy=0.674384 Epoch 1 ; Time: 0.455926 ; Training: accuracy=0.520890 ; Validation: accuracy=0.735177 Epoch 2 ; Time: 0.826737 ; Training: accuracy=0.694713 ; Validation: accuracy=0.771319 Epoch 3 ; Time: 1.195614 ; Training: accuracy=0.736576 ; Validation: accuracy=0.830779 Epoch 1 ; Time: 0.752027 ; Training: accuracy=0.044942 ; Validation: accuracy=0.176768 Epoch 1 ; Time: 0.383156 ; Training: accuracy=0.638724 ; Validation: accuracy=0.810774 Epoch 2 ; Time: 0.674688 ; Training: accuracy=0.777282 ; Validation: accuracy=0.842088 Epoch 3 ; Time: 0.946444 ; Training: accuracy=0.801091 ; Validation: accuracy=0.873064 Epoch 4 ; Time: 1.219203 ; Training: accuracy=0.815228 ; Validation: accuracy=0.879630 Epoch 5 ; Time: 1.490317 ; Training: accuracy=0.833003 ; Validation: accuracy=0.888215 Epoch 6 ; Time: 1.798936 ; Training: accuracy=0.839699 ; Validation: accuracy=0.888384 Epoch 7 ; Time: 2.072390 ; Training: accuracy=0.847636 ; Validation: accuracy=0.891077 Epoch 8 ; Time: 2.345393 ; Training: accuracy=0.850694 ; Validation: accuracy=0.903704 Epoch 9 ; Time: 2.616894 ; Training: accuracy=0.850777 ; Validation: accuracy=0.902525 Epoch 1 ; Time: 0.322179 ; Training: accuracy=0.270546 ; Validation: accuracy=0.473270 Epoch 1 ; Time: 0.458280 ; Training: accuracy=0.571879 ; Validation: accuracy=0.768435 Epoch 2 ; Time: 0.855828 ; Training: accuracy=0.775576 ; Validation: accuracy=0.840174 Epoch 3 ; Time: 1.236585 ; Training: accuracy=0.823081 ; Validation: accuracy=0.850017 Epoch 4 ; Time: 1.628630 ; Training: accuracy=0.848284 ; Validation: accuracy=0.888055 Epoch 5 ; Time: 2.045398 ; Training: accuracy=0.869176 ; Validation: accuracy=0.899733 Epoch 6 ; Time: 2.436024 ; Training: accuracy=0.879042 ; Validation: accuracy=0.897064 Epoch 7 ; Time: 2.827001 ; Training: accuracy=0.886089 ; Validation: accuracy=0.908075 Epoch 8 ; Time: 3.220183 ; Training: accuracy=0.897778 ; Validation: accuracy=0.920754 Epoch 9 ; Time: 3.611714 ; Training: accuracy=0.903250 ; Validation: accuracy=0.917251 Epoch 1 ; Time: 0.577712 ; Training: accuracy=0.575012 ; Validation: accuracy=0.773837 Epoch 2 ; Time: 1.045443 ; Training: accuracy=0.744908 ; Validation: accuracy=0.825192 Epoch 3 ; Time: 1.512438 ; Training: accuracy=0.794006 ; Validation: accuracy=0.861659 Epoch 4 ; Time: 1.989012 ; Training: accuracy=0.817685 ; Validation: accuracy=0.871863 Epoch 5 ; Time: 2.456224 ; Training: accuracy=0.833582 ; Validation: accuracy=0.887755 Epoch 6 ; Time: 2.924245 ; Training: accuracy=0.846829 ; Validation: accuracy=0.893610 Epoch 7 ; Time: 3.430988 ; Training: accuracy=0.851714 ; Validation: accuracy=0.905320 Epoch 8 ; Time: 3.918977 ; Training: accuracy=0.863719 ; Validation: accuracy=0.906992 Epoch 9 ; Time: 4.414324 ; Training: accuracy=0.868356 ; Validation: accuracy=0.911676 Epoch 1 ; Time: 0.944247 ; Training: accuracy=0.433936 ; Validation: accuracy=0.641801 Epoch 1 ; Time: 0.363867 ; Training: accuracy=0.642586 ; Validation: accuracy=0.789894 Epoch 2 ; Time: 0.672631 ; Training: accuracy=0.778822 ; Validation: accuracy=0.832114 Epoch 3 ; Time: 1.031989 ; Training: accuracy=0.800346 ; Validation: accuracy=0.862367 Epoch 4 ; Time: 1.362332 ; Training: accuracy=0.826159 ; Validation: accuracy=0.875332 Epoch 5 ; Time: 1.667801 ; Training: accuracy=0.836302 ; Validation: accuracy=0.875332 Epoch 6 ; Time: 1.976715 ; Training: accuracy=0.844054 ; Validation: accuracy=0.880319 Epoch 7 ; Time: 2.283057 ; Training: accuracy=0.852713 ; Validation: accuracy=0.877660 Epoch 8 ; Time: 2.584614 ; Training: accuracy=0.863434 ; Validation: accuracy=0.896443 Epoch 9 ; Time: 2.890280 ; Training: accuracy=0.865413 ; Validation: accuracy=0.885140 Epoch 1 ; Time: 0.443991 ; Training: accuracy=0.388039 ; Validation: accuracy=0.691309 Epoch 1 ; Time: 0.426861 ; Training: accuracy=0.672291 ; Validation: accuracy=0.813686 Epoch 2 ; Time: 0.809158 ; Training: accuracy=0.850455 ; Validation: accuracy=0.868465 Epoch 3 ; Time: 1.211733 ; Training: accuracy=0.881390 ; Validation: accuracy=0.906593 Epoch 4 ; Time: 1.588458 ; Training: accuracy=0.910008 ; Validation: accuracy=0.912088 Epoch 5 ; Time: 1.955405 ; Training: accuracy=0.922663 ; Validation: accuracy=0.915751 Epoch 6 ; Time: 2.318612 ; Training: accuracy=0.934243 ; Validation: accuracy=0.930569 Epoch 7 ; Time: 2.721848 ; Training: accuracy=0.940612 ; Validation: accuracy=0.927739 Epoch 8 ; Time: 3.162516 ; Training: accuracy=0.946402 ; Validation: accuracy=0.934565 Epoch 9 ; Time: 3.550003 ; Training: accuracy=0.951613 ; Validation: accuracy=0.929071 Epoch 1 ; Time: 0.368529 ; Training: accuracy=0.406714 ; Validation: accuracy=0.661320 Epoch 1 ; Time: 0.398520 ; Training: accuracy=0.464457 ; Validation: accuracy=0.682689 Epoch 1 ; Time: 0.384942 ; Training: accuracy=0.457259 ; Validation: accuracy=0.646043 Epoch 1 ; Time: 0.409404 ; Training: accuracy=0.383181 ; Validation: accuracy=0.619832 Epoch 1 ; Time: 0.538824 ; Training: accuracy=0.470978 ; Validation: accuracy=0.714141 Epoch 1 ; Time: 0.396265 ; Training: accuracy=0.530034 ; Validation: accuracy=0.764236 Epoch 2 ; Time: 0.722034 ; Training: accuracy=0.728001 ; Validation: accuracy=0.828505 Epoch 3 ; Time: 1.087246 ; Training: accuracy=0.786003 ; Validation: accuracy=0.851315 Epoch 1 ; Time: 0.471924 ; Training: accuracy=0.548577 ; Validation: accuracy=0.744837 Epoch 2 ; Time: 0.856599 ; Training: accuracy=0.753062 ; Validation: accuracy=0.805796 Epoch 3 ; Time: 1.233011 ; Training: accuracy=0.806687 ; Validation: accuracy=0.845103 Epoch 1 ; Time: 0.403902 ; Training: accuracy=0.573500 ; Validation: accuracy=0.757267 Epoch 2 ; Time: 0.741710 ; Training: accuracy=0.800923 ; Validation: accuracy=0.834447 Epoch 3 ; Time: 1.113848 ; Training: accuracy=0.858355 ; Validation: accuracy=0.868527 Epoch 4 ; Time: 1.449182 ; Training: accuracy=0.887195 ; Validation: accuracy=0.884564 Epoch 5 ; Time: 1.848668 ; Training: accuracy=0.911503 ; Validation: accuracy=0.897093 Epoch 6 ; Time: 2.207992 ; Training: accuracy=0.919084 ; Validation: accuracy=0.909121 Epoch 7 ; Time: 2.547297 ; Training: accuracy=0.934410 ; Validation: accuracy=0.924658 Epoch 8 ; Time: 2.920436 ; Training: accuracy=0.943309 ; Validation: accuracy=0.918476 Epoch 9 ; Time: 3.282759 ; Training: accuracy=0.947017 ; Validation: accuracy=0.929669 Epoch 1 ; Time: 0.391132 ; Training: accuracy=0.611194 ; Validation: accuracy=0.761953 Epoch 2 ; Time: 0.782814 ; Training: accuracy=0.803980 ; Validation: accuracy=0.840236 Epoch 3 ; Time: 1.159332 ; Training: accuracy=0.854892 ; Validation: accuracy=0.875926 Epoch 4 ; Time: 1.490009 ; Training: accuracy=0.882338 ; Validation: accuracy=0.902694 Epoch 5 ; Time: 1.801840 ; Training: accuracy=0.906302 ; Validation: accuracy=0.920370 Epoch 6 ; Time: 2.220920 ; Training: accuracy=0.915837 ; Validation: accuracy=0.924916 Epoch 7 ; Time: 2.595611 ; Training: accuracy=0.928275 ; Validation: accuracy=0.926936 Epoch 8 ; Time: 2.921697 ; Training: accuracy=0.932172 ; Validation: accuracy=0.926936 Epoch 9 ; Time: 3.257192 ; Training: accuracy=0.937977 ; Validation: accuracy=0.937374 Epoch 1 ; Time: 0.406384 ; Training: accuracy=0.675000 ; Validation: accuracy=0.811000 Epoch 2 ; Time: 0.753011 ; Training: accuracy=0.850414 ; Validation: accuracy=0.872000 Epoch 3 ; Time: 1.104501 ; Training: accuracy=0.890646 ; Validation: accuracy=0.904833 Epoch 4 ; Time: 1.468058 ; Training: accuracy=0.915480 ; Validation: accuracy=0.909667 Epoch 5 ; Time: 1.811414 ; Training: accuracy=0.923013 ; Validation: accuracy=0.906833 Epoch 6 ; Time: 2.160280 ; Training: accuracy=0.939735 ; Validation: accuracy=0.913333 Epoch 7 ; Time: 2.512255 ; Training: accuracy=0.949917 ; Validation: accuracy=0.931333 Epoch 8 ; Time: 2.872170 ; Training: accuracy=0.953974 ; Validation: accuracy=0.933500 Epoch 9 ; Time: 3.218282 ; Training: accuracy=0.957533 ; Validation: accuracy=0.925833 Epoch 1 ; Time: 0.319053 ; Training: accuracy=0.709950 ; Validation: accuracy=0.844292 Epoch 2 ; Time: 0.589575 ; Training: accuracy=0.870813 ; Validation: accuracy=0.875334 Epoch 3 ; Time: 0.987917 ; Training: accuracy=0.902324 ; Validation: accuracy=0.904039 Epoch 4 ; Time: 1.249979 ; Training: accuracy=0.924241 ; Validation: accuracy=0.905040 Epoch 5 ; Time: 1.531832 ; Training: accuracy=0.926226 ; Validation: accuracy=0.908378 Epoch 6 ; Time: 1.788824 ; Training: accuracy=0.940121 ; Validation: accuracy=0.922230 Epoch 7 ; Time: 2.067610 ; Training: accuracy=0.948639 ; Validation: accuracy=0.923231 Epoch 8 ; Time: 2.327184 ; Training: accuracy=0.950955 ; Validation: accuracy=0.926569 Epoch 9 ; Time: 2.611477 ; Training: accuracy=0.957820 ; Validation: accuracy=0.933745 Epoch 1 ; Time: 0.419348 ; Training: accuracy=0.633882 ; Validation: accuracy=0.791375 Epoch 2 ; Time: 0.785120 ; Training: accuracy=0.822649 ; Validation: accuracy=0.857642 Epoch 3 ; Time: 1.140644 ; Training: accuracy=0.874018 ; Validation: accuracy=0.890276 Epoch 4 ; Time: 1.496016 ; Training: accuracy=0.904459 ; Validation: accuracy=0.895604 Epoch 5 ; Time: 1.856606 ; Training: accuracy=0.918107 ; Validation: accuracy=0.901265 Epoch 6 ; Time: 2.219018 ; Training: accuracy=0.932252 ; Validation: accuracy=0.921911 Epoch 7 ; Time: 2.587913 ; Training: accuracy=0.936637 ; Validation: accuracy=0.926906 Epoch 8 ; Time: 2.995049 ; Training: accuracy=0.948135 ; Validation: accuracy=0.930736 Epoch 9 ; Time: 3.366530 ; Training: accuracy=0.951195 ; Validation: accuracy=0.932900 Epoch 1 ; Time: 0.362235 ; Training: accuracy=0.717879 ; Validation: accuracy=0.818816 Epoch 2 ; Time: 0.717216 ; Training: accuracy=0.838941 ; Validation: accuracy=0.847407 Epoch 3 ; Time: 1.011797 ; Training: accuracy=0.877288 ; Validation: accuracy=0.877992 Epoch 4 ; Time: 1.307802 ; Training: accuracy=0.891638 ; Validation: accuracy=0.883477 Epoch 5 ; Time: 1.607926 ; Training: accuracy=0.901204 ; Validation: accuracy=0.879322 Epoch 6 ; Time: 1.917984 ; Training: accuracy=0.905162 ; Validation: accuracy=0.870013 Epoch 7 ; Time: 2.220767 ; Training: accuracy=0.917038 ; Validation: accuracy=0.909408 Epoch 8 ; Time: 2.541039 ; Training: accuracy=0.925697 ; Validation: accuracy=0.898770 Epoch 9 ; Time: 2.893164 ; Training: accuracy=0.922233 ; Validation: accuracy=0.911735 Epoch 1 ; Time: 0.661581 ; Training: accuracy=0.565527 ; Validation: accuracy=0.718211 Epoch 1 ; Time: 0.540891 ; Training: accuracy=0.635220 ; Validation: accuracy=0.782456 Epoch 2 ; Time: 1.021876 ; Training: accuracy=0.824065 ; Validation: accuracy=0.852632 Epoch 3 ; Time: 1.497699 ; Training: accuracy=0.874710 ; Validation: accuracy=0.872180 Epoch 4 ; Time: 1.987899 ; Training: accuracy=0.899123 ; Validation: accuracy=0.894069 Epoch 5 ; Time: 2.488709 ; Training: accuracy=0.916005 ; Validation: accuracy=0.910443 Epoch 6 ; Time: 2.983786 ; Training: accuracy=0.927756 ; Validation: accuracy=0.905263 Epoch 7 ; Time: 3.506227 ; Training: accuracy=0.937272 ; Validation: accuracy=0.917794 Epoch 8 ; Time: 4.000621 ; Training: accuracy=0.945382 ; Validation: accuracy=0.928488 Epoch 9 ; Time: 4.512321 ; Training: accuracy=0.950844 ; Validation: accuracy=0.929156 Analysing the results ~~~~~~~~~~~~~~~~~~~~~ The training history is stored in the ``results_df``, the main fields are the runtime and ``'best'`` (the objective). **Note**: You will get slightly different curves for different pairs of scheduler/searcher, the ``time_out`` here is a bit too short to really see the difference in a significant way (it would be better to set it to >1000s). Generally speaking though, hyperband stopping / promotion + model will tend to significantly outperform other combinations given enough time. .. code:: python results_df.head() .. raw:: html
bracket elapsed_time epoch error eval_time objective runtime searcher_data_size searcher_params_kernel_covariance_scale searcher_params_kernel_inv_bw0 ... searcher_params_kernel_inv_bw7 searcher_params_kernel_inv_bw8 searcher_params_mean_mean_value searcher_params_noise_variance target_epoch task_id time_since_start time_step time_this_iter best
0 0 0.488541 1 0.468750 0.483711 0.531250 1.571379 NaN 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 1.573284 1.614630e+09 0.517057 0.468750
1 0 0.927435 2 0.344753 0.434316 0.655247 2.010273 1.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.011093 1.614630e+09 0.438870 0.344753
2 0 1.351925 3 0.305314 0.422602 0.694686 2.434762 1.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.435731 1.614630e+09 0.424491 0.305314
3 0 1.777915 4 0.288937 0.423248 0.711063 2.860753 2.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 2.861669 1.614630e+09 0.425990 0.288937
4 0 2.204442 5 0.273061 0.424562 0.726939 3.287279 2.0 1.0 1.0 ... 1.0 1.0 0.0 0.001 9 0 3.288359 1.614630e+09 0.426525 0.273061

5 rows × 26 columns

.. code:: python import matplotlib.pyplot as plt plt.figure(figsize=(12, 8)) runtime = results_df['runtime'].values objective = results_df['best'].values plt.plot(runtime, objective, lw=2) plt.xticks(fontsize=12) plt.xlim(0, 120) plt.ylim(0, 0.5) plt.yticks(fontsize=12) plt.xlabel("Runtime [s]", fontsize=14) plt.ylabel("Objective", fontsize=14) .. parsed-literal:: :class: output Text(0, 0.5, 'Objective') Diving Deeper ------------- Now, you are ready to try HPO on your own machine learning models (if you use PyTorch, have a look at :ref:`sec_customstorch`). While AutoGluon comes with well-chosen defaults, it can pay off to tune it to your specific needs. Here are some tips which may come useful. Logging the Search Progress ~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, it is a good idea in general to switch on ``debug_log``, which outputs useful information about the search progress. This is already done in the example above. The outputs show which configurations are chosen, stopped, or promoted. For BO and BOHB, a range of information is displayed for every ``get_config`` decision. This log output is very useful in order to figure out what is going on during the search. Configuring ``HyperbandScheduler`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most important knobs to turn with ``HyperbandScheduler`` are ``max_t``, ``grace_period``, ``reduction_factor``, ``brackets``, and ``type``. The first three determine the rung levels at which stopping or promotion decisions are being made. - The maximum resource level ``max_t`` (usually, resource equates to epochs, so ``max_t`` is the maximum number of training epochs) is typically hardcoded in ``train_fn`` passed to the scheduler (this is ``run_mlp_openml`` in the example above). As already noted above, the value is best fixed in the ``ag.args`` decorator as ``epochs=XYZ``, it can then be accessed as ``args.epochs`` in the ``train_fn`` code. If this is done, you do not have to pass ``max_t`` when creating the scheduler. - ``grace_period`` and ``reduction_factor`` determine the rung levels, which are ``grace_period``, ``grace_period * reduction_factor``, ``grace_period * (reduction_factor ** 2)``, etc. All rung levels must be less or equal than ``max_t``. It is recommended to make ``max_t`` equal to the largest rung level. For example, if ``grace_period = 1``, ``reduction_factor = 3``, it is in general recommended to use ``max_t = 9``, ``max_t = 27``, or ``max_t = 81``. Choosing a ``max_t`` value "off the grid" works against the successive halving principle that the total resources spent in a rung should be roughly equal between rungs. If in the example above, you set ``max_t = 10``, about a third of configurations reaching 9 epochs are allowed to proceed, but only for one more epoch. - With ``reduction_factor``, you tune the extent to which successive halving filtering is applied. The larger this integer, the fewer configurations make it to higher number of epochs. Values 2, 3, 4 are commonly used. - Finally, ``grace_period`` should be set to the smallest resource (number of epochs) for which you expect any meaningful differentiation between configurations. While ``grace_period = 1`` should always be explored, it may be too low for any meaningful stopping decisions to be made at the first rung. - ``brackets`` sets the maximum number of brackets in Hyperband (make sure to study the Hyperband paper or follow-ups for details). For ``brackets = 1``, you are running successive halving (single bracket). Higher brackets have larger effective ``grace_period`` values (so runs are not stopped until later), yet are also chosen with less probability. We recommend to always consider successive halving (``brackets = 1``) in a comparison. - Finally, with ``type`` (values ``stopping``, ``promotion``) you are choosing different ways of extending successive halving scheduling to the asynchronous case. The method for the default ``stopping`` is simpler and seems to perform well, but ``promotion`` is more careful promoting configurations to higher resource levels, which can work better in some cases. Asynchronous BOHB ~~~~~~~~~~~~~~~~~ Finally, here are some ideas for tuning asynchronous BOHB, apart from tuning its ``HyperbandScheduling`` component. You need to pass these options in ``search_options``. - We support a range of different surrogate models over the criterion functions across resource levels. All of them are jointly dependent Gaussian process models, meaning that data collected at all resource levels are modelled together. The surrogate model is selected by ``gp_resource_kernel``, values are ``matern52``, ``matern52-res-warp``, ``exp-decay-sum``, ``exp-decay-combined``, ``exp-decay-delta1``. These are variants of either a joint Matern 5/2 kernel over configuration and resource, or the exponential decay model. Details about the latter can be found `here `__. - Fitting a Gaussian process surrogate model to data encurs a cost which scales cubically with the number of datapoints. When applied to expensive deep learning workloads, even multi-fidelity asynchronous BOHB is rarely running up more than 100 observations or so (across all rung levels and brackets), and the GP computations are subdominant. However, if you apply it to cheaper ``train_fn`` and find yourself beyond 2000 total evaluations, the cost of GP fitting can become painful. In such a situation, you can explore the options ``opt_skip_period`` and ``opt_skip_num_max_resource``. The basic idea is as follows. By far the most expensive part of a ``get_config`` call (picking the next configuration) is the refitting of the GP model to past data (this entails re-optimizing hyperparameters of the surrogate model itself). The options allow you to skip this expensive step for most ``get_config`` calls, after some initial period. Check the docstrings for details about these options. If you find yourself in such a situation and gain experience with these skipping features, make sure to contact the AutoGluon developers -- we would love to learn about your use case.