# Demo RL Searcher¶

In this tutorial, we are going to compare RL searcher with random search in a simulation environment.

## A Toy Reward Space¶

Input Space x = [0: 99], y = [0: 99]. The rewards are a combination of 2 gaussians as shown in the following figure:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


Generate the simulation rewards as a mixture of 2 gaussians:

def gaussian(x, y, x0, y0, xalpha, yalpha, A):
return A * np.exp( -((x-x0)/xalpha)**2 -((y-y0)/yalpha)**2)

x, y = np.linspace(0, 99, 100), np.linspace(0, 99, 100)
X, Y = np.meshgrid(x, y)

Z = np.zeros(X.shape)
ps = [(20, 70, 35, 40, 1),
(80, 40, 20, 20, 0.7)]
for p in ps:
Z += gaussian(X, Y, *p)


Visualize the reward space:

fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, cmap='plasma')
ax.set_zlim(0,np.max(Z)+2)
plt.show()

/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/ipykernel_launcher.py:2: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().


## Simulation Experiment¶

### Customize Train Function¶

We can define any function with a decorator @ag.args, which converts the function to AutoGluon searchable. The reporter is used to communicate with AutoGluon search algorithms.

import autogluon.core as ag

@ag.args(
x=ag.space.Categorical(*list(range(100))),
y=ag.space.Categorical(*list(range(100))),
)
def rl_simulation(args, reporter):
x, y = args.x, args.y
reporter(accuracy=Z[y][x])


### Random Search Baseline¶

random_scheduler = ag.scheduler.FIFOScheduler(rl_simulation,
resource={'num_cpus': 1, 'num_gpus': 0},
num_trials=300,
reward_attr='accuracy')
random_scheduler.run()
random_scheduler.join_jobs()
print('Best config: {}, best reward: {}'.format(random_scheduler.get_best_config(), random_scheduler.get_best_reward()))

  0%|          | 0/300 [00:00<?, ?it/s]

Best config: {'x▁choice': 22, 'y▁choice': 69}, best reward: 0.9961362873852471


### Reinforcement Learning¶

rl_scheduler = ag.scheduler.RLScheduler(rl_simulation,
resource={'num_cpus': 1, 'num_gpus': 0},
num_trials=300,
reward_attr='accuracy',
controller_batch_size=4,
controller_lr=5e-3,
checkpoint='./rl_exp/checkerpoint.ag')
rl_scheduler.run()
rl_scheduler.join_jobs()
print('Best config: {}, best reward: {}'.format(rl_scheduler.get_best_config(), rl_scheduler.get_best_reward()))

100%|██████████| 76/76 [03:20<00:00,  2.64s/it]

Best config: {'x▁choice': 21, 'y▁choice': 74}, best reward: 0.9892484241569526


### Compare the Performance¶

Get the result history:

results_rl = [v[0]['accuracy'] for v in rl_scheduler.training_history.values()]
results_random = [v[0]['accuracy'] for v in random_scheduler.training_history.values()]


Average result every 10 trials:

import statistics
results1 = [statistics.mean(results_random[i:i+10]) for i in range(0, len(results_random), 10)]
results2 = [statistics.mean(results_rl[i:i+10]) for i in range(0, len(results_rl), 10)]


Plot the results:

plt.plot(range(len(results1)), results1, range(len(results2)), results2)

[<mpl_toolkits.mplot3d.art3d.Line3D at 0x7f16f6892690>,
<mpl_toolkits.mplot3d.art3d.Line3D at 0x7f17e5d33d50>]