How to Use ENAS/ProxylessNAS in Ten Minutes

What is the Key Idea of ENAS and ProxylessNAS?

Traditional reinforcement learning-based neural architecture search learns an architecture controller by iteratively sampling the architecture and training the model to get final reward to update the controller. It is extremely expensive process due to training CNN.

https://raw.githubusercontent.com/zhanghang1989/AutoGluonWebdata/master/docs/tutorial/proxyless.png

Fig. 3 ProxylessNAS

Recent work of ENAS and ProxylessNAS construct an over-parameterized network (supernet) and share the weights across different architecture to speed up the search speed. The reward is calculated every few iterations instead of every training period.

Import MXNet and AutoGluon:

import autogluon.core as ag
import mxnet as mx
import mxnet.gluon.nn as nn

How to Construct a SuperNet

Basic NN blocks for CNN.

class Identity(mx.gluon.HybridBlock):
    def hybrid_forward(self, F, x):
        return x

class ConvBNReLU(mx.gluon.HybridBlock):
    def __init__(self, in_channels, channels, kernel, stride):
        super().__init__()
        padding = (kernel - 1) // 2
        self.conv = nn.Conv2D(channels, kernel, stride, padding, in_channels=in_channels)
        self.bn = nn.BatchNorm(in_channels=channels)
        self.relu = nn.Activation('relu')
    def hybrid_forward(self, F, x):
        return self.relu(self.bn(self.conv(x)))

AutoGluon ENAS Unit

from autogluon.extra.contrib.enas import *

@enas_unit()
class ResUnit(mx.gluon.HybridBlock):
    def __init__(self, in_channels, channels, hidden_channels, kernel, stride):
        super().__init__()
        self.conv1 = ConvBNReLU(in_channels, hidden_channels, kernel, stride)
        self.conv2 = ConvBNReLU(hidden_channels, channels, kernel, 1)
        if in_channels == channels and stride == 1:
            self.shortcut = Identity()
        else:
            self.shortcut = nn.Conv2D(channels, 1, stride, in_channels=in_channels)
    def hybrid_forward(self, F, x):
        return self.conv2(self.conv1(x)) + self.shortcut(x)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/gluoncv/__init__.py:40: UserWarning: Both mxnet==1.7.0 and torch==1.9.0+cu102 are installed. You might encounter increased GPU memory footprint if both framework are used at the same time.
  warnings.warn(f'Both mxnet=={mx.__version__} and torch=={torch.__version__} are installed. '

AutoGluon Sequntial

Creating a ENAS network using Sequential Block:

mynet = ENAS_Sequential(
    ResUnit(1, 8, hidden_channels=ag.space.Categorical(4, 8), kernel=ag.space.Categorical(3, 5), stride=2),
    ResUnit(8, 8, hidden_channels=8, kernel=ag.space.Categorical(3, 5), stride=2),
    ResUnit(8, 16, hidden_channels=8, kernel=ag.space.Categorical(3, 5), stride=2),
    ResUnit(16, 16, hidden_channels=8, kernel=ag.space.Categorical(3, 5), stride=1, with_zero=True),
    ResUnit(16, 16, hidden_channels=8, kernel=ag.space.Categorical(3, 5), stride=1, with_zero=True),
    nn.GlobalAvgPool2D(),
    nn.Flatten(),
    nn.Activation('relu'),
    nn.Dense(10, in_units=16),
)

mynet.initialize()

#mynet.graph
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:656: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  self.collect_params().initialize(init, ctx, verbose, force_reinit)

Evaluate Network Latency and Define Reward Function

x = mx.nd.random.uniform(shape=(1, 1, 28, 28))
y = mynet.evaluate_latency(x)

Show the latencies:

print('Average latency is {:.2f} ms, latency of the current architecture is {:.2f} ms'.format(mynet.avg_latency, mynet.latency))
Average latency is 4.45 ms, latency of the current architecture is 5.02 ms

We also provide number of params

mynet.nparams
8714

Define the reward function:

reward_fn = lambda metric, net: metric * ((net.avg_latency / net.latency) ** 0.1)

Start the Training

Construct experiment scheduler, which automatically creates an RL controller based on user-defined search space.

scheduler = ENAS_Scheduler(mynet, train_set='mnist',
                           reward_fn=reward_fn, batch_size=128, num_gpus=1,
                           warmup_epochs=0, epochs=1, controller_lr=3e-3,
                           plot_frequency=10, update_arch_frequency=5)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/extra/src/autogluon/extra/contrib/enas/enas_scheduler.py:78: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  self.supernet.collect_params().reset_ctx(ctx)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/extra/src/autogluon/extra/contrib/enas/enas_utils.py:15: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  train_args['trainer'] = gluon.Trainer(net.collect_params(), 'sgd', optimizer_params)

Start the training:

scheduler.run()
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/468 [00:00<?, ?it/s]/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:926: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  params = self.collect_params()
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:682: UserWarning: Parameter batchnorm18_running_mean, batchnorm6_running_var, conv0_weight, batchnorm8_running_mean, conv10_weight, conv2_bias, batchnorm19_gamma, batchnorm12_beta, conv9_bias, batchnorm18_gamma, batchnorm13_gamma, batchnorm12_running_var, batchnorm9_running_var, batchnorm20_running_var, batchnorm6_gamma, conv3_bias, conv24_bias, batchnorm7_beta, conv10_bias, batchnorm12_running_mean, batchnorm18_beta, conv5_weight, conv14_weight, batchnorm21_beta, batchnorm16_beta, batchnorm2_beta, batchnorm0_running_mean, conv1_bias, batchnorm3_running_mean, batchnorm2_running_var, batchnorm3_beta, conv18_bias, conv24_weight, batchnorm13_beta, conv11_bias, batchnorm2_gamma, conv27_bias, batchnorm6_beta, batchnorm19_beta, conv28_weight, batchnorm9_running_mean, batchnorm12_gamma, conv4_weight, conv13_weight, batchnorm16_running_mean, batchnorm8_gamma, batchnorm8_beta, conv5_bias, batchnorm1_running_var, batchnorm7_running_var, batchnorm19_running_var, conv27_weight, conv1_weight, conv26_weight, batchnorm20_gamma, batchnorm16_gamma, batchnorm2_running_mean, conv19_weight, conv14_bias, conv0_bias, batchnorm0_beta, conv25_weight, batchnorm0_gamma, conv13_bias, batchnorm13_running_var, conv18_weight, batchnorm21_gamma, batchnorm6_running_mean, batchnorm9_gamma, batchnorm16_running_var, conv29_bias, batchnorm7_running_mean, batchnorm19_running_mean, conv26_bias, batchnorm9_beta, batchnorm8_running_var, conv2_weight, batchnorm1_gamma, conv19_bias, batchnorm7_gamma, conv25_bias, batchnorm1_beta, conv20_bias, batchnorm3_gamma, conv12_weight, conv4_bias, batchnorm13_running_mean, batchnorm21_running_var, conv3_weight, conv20_weight, batchnorm18_running_var, batchnorm17_running_var, conv11_weight, batchnorm21_running_mean, batchnorm17_gamma, batchnorm0_running_var, conv29_weight, conv12_bias, batchnorm1_running_mean, batchnorm17_beta, batchnorm3_running_var, conv9_weight, conv28_bias, batchnorm20_running_mean, batchnorm20_beta, batchnorm17_running_mean is not used by any computation. Is this intended?
  out = self.forward(*args)

avg reward: 0.10:   0%|          | 0/468 [00:00<?, ?it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   0%|          | 1/468 [00:00<04:49,  1.61it/s]
avg reward: 0.10:   1%|▏         | 6/468 [00:00<00:57,  7.99it/s]
avg reward: 0.10:   1%|▏         | 6/468 [00:00<00:57,  7.99it/s]
avg reward: 0.10:   1%|▏         | 6/468 [00:00<00:57,  7.99it/s]
avg reward: 0.10:   1%|▏         | 6/468 [00:00<00:57,  7.99it/s]
avg reward: 0.10:   1%|▏         | 6/468 [00:01<00:57,  7.99it/s]
avg reward: 0.10:   2%|▏         | 10/468 [00:01<00:34, 13.29it/s]
avg reward: 0.10:   2%|▏         | 10/468 [00:01<00:34, 13.29it/s]
avg reward: 0.10:   2%|▏         | 10/468 [00:01<00:34, 13.29it/s]
avg reward: 0.10:   2%|▏         | 10/468 [00:01<00:34, 13.29it/s]
avg reward: 0.10:   3%|▎         | 13/468 [00:01<00:34, 13.31it/s]
avg reward: 0.10:   3%|▎         | 13/468 [00:01<00:34, 13.31it/s]
avg reward: 0.10:   3%|▎         | 13/468 [00:01<00:34, 13.31it/s]
avg reward: 0.10:   3%|▎         | 13/468 [00:01<00:34, 13.31it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   3%|▎         | 16/468 [00:01<00:34, 12.95it/s]
avg reward: 0.10:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.10:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.10:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.10:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.10:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.14:   4%|▍         | 21/468 [00:01<00:29, 15.32it/s]
avg reward: 0.14:   6%|▌         | 26/468 [00:01<00:26, 16.58it/s]
avg reward: 0.14:   6%|▌         | 26/468 [00:02<00:26, 16.58it/s]
avg reward: 0.14:   6%|▌         | 26/468 [00:02<00:26, 16.58it/s]
avg reward: 0.14:   6%|▌         | 26/468 [00:02<00:26, 16.58it/s]
avg reward: 0.14:   6%|▌         | 26/468 [00:02<00:26, 16.58it/s]
avg reward: 0.14:   6%|▋         | 30/468 [00:02<00:21, 19.98it/s]
avg reward: 0.28:   6%|▋         | 30/468 [00:02<00:21, 19.98it/s]
avg reward: 0.28:   6%|▋         | 30/468 [00:02<00:21, 19.98it/s]
avg reward: 0.28:   6%|▋         | 30/468 [00:02<00:21, 19.98it/s]
avg reward: 0.28:   7%|▋         | 33/468 [00:02<00:25, 17.38it/s]
avg reward: 0.28:   7%|▋         | 33/468 [00:02<00:25, 17.38it/s]
avg reward: 0.28:   7%|▋         | 33/468 [00:02<00:25, 17.38it/s]
avg reward: 0.38:   7%|▋         | 33/468 [00:02<00:25, 17.38it/s]
avg reward: 0.38:   8%|▊         | 36/468 [00:02<00:26, 16.55it/s]
avg reward: 0.38:   8%|▊         | 36/468 [00:02<00:26, 16.55it/s]
avg reward: 0.38:   8%|▊         | 36/468 [00:02<00:26, 16.55it/s]
avg reward: 0.38:   8%|▊         | 36/468 [00:02<00:26, 16.55it/s]
avg reward: 0.38:   8%|▊         | 36/468 [00:02<00:26, 16.55it/s]
avg reward: 0.38:   9%|▊         | 40/468 [00:02<00:20, 20.52it/s]
avg reward: 0.41:   9%|▊         | 40/468 [00:02<00:20, 20.52it/s]
avg reward: 0.41:   9%|▊         | 40/468 [00:02<00:20, 20.52it/s]
avg reward: 0.41:   9%|▊         | 40/468 [00:02<00:20, 20.52it/s]
avg reward: 0.41:   9%|▉         | 43/468 [00:02<00:24, 17.54it/s]
avg reward: 0.41:   9%|▉         | 43/468 [00:02<00:24, 17.54it/s]
avg reward: 0.41:   9%|▉         | 43/468 [00:02<00:24, 17.54it/s]
avg reward: 0.48:   9%|▉         | 43/468 [00:03<00:24, 17.54it/s]
avg reward: 0.48:  10%|▉         | 46/468 [00:03<00:25, 16.50it/s]
avg reward: 0.48:  10%|▉         | 46/468 [00:03<00:25, 16.50it/s]
avg reward: 0.48:  10%|▉         | 46/468 [00:03<00:25, 16.50it/s]
avg reward: 0.48:  10%|▉         | 46/468 [00:03<00:25, 16.50it/s]
avg reward: 0.48:  10%|▉         | 46/468 [00:03<00:25, 16.50it/s]
avg reward: 0.48:  11%|█         | 50/468 [00:03<00:20, 20.57it/s]
avg reward: 0.53:  11%|█         | 50/468 [00:03<00:20, 20.57it/s]
avg reward: 0.53:  11%|█         | 50/468 [00:03<00:20, 20.57it/s]
avg reward: 0.53:  11%|█         | 50/468 [00:03<00:20, 20.57it/s]
avg reward: 0.53:  11%|█▏        | 53/468 [00:03<00:22, 18.66it/s]
avg reward: 0.53:  11%|█▏        | 53/468 [00:03<00:22, 18.66it/s]
avg reward: 0.53:  11%|█▏        | 53/468 [00:03<00:22, 18.66it/s]
avg reward: 0.56:  11%|█▏        | 53/468 [00:03<00:22, 18.66it/s]
avg reward: 0.56:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.56:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.56:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.56:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.56:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.60:  12%|█▏        | 56/468 [00:03<00:25, 16.39it/s]
avg reward: 0.60:  13%|█▎        | 61/468 [00:03<00:23, 17.23it/s]
avg reward: 0.60:  13%|█▎        | 61/468 [00:03<00:23, 17.23it/s]
avg reward: 0.60:  13%|█▎        | 61/468 [00:03<00:23, 17.23it/s]
avg reward: 0.60:  13%|█▎        | 61/468 [00:03<00:23, 17.23it/s]
avg reward: 0.60:  13%|█▎        | 61/468 [00:04<00:23, 17.23it/s]
avg reward: 0.60:  14%|█▍        | 65/468 [00:04<00:19, 20.77it/s]
avg reward: 0.65:  14%|█▍        | 65/468 [00:04<00:19, 20.77it/s]
avg reward: 0.65:  14%|█▍        | 65/468 [00:04<00:19, 20.77it/s]
avg reward: 0.65:  14%|█▍        | 65/468 [00:04<00:19, 20.77it/s]
avg reward: 0.65:  15%|█▍        | 68/468 [00:04<00:21, 18.36it/s]
avg reward: 0.65:  15%|█▍        | 68/468 [00:04<00:21, 18.36it/s]
avg reward: 0.65:  15%|█▍        | 68/468 [00:04<00:21, 18.36it/s]
avg reward: 0.59:  15%|█▍        | 68/468 [00:04<00:21, 18.36it/s]
avg reward: 0.59:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.59:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.59:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.59:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.59:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.71:  15%|█▌        | 71/468 [00:04<00:23, 16.94it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:22, 17.40it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:22, 17.40it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:22, 17.40it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:22, 17.40it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:22, 17.40it/s]
avg reward: 0.71:  17%|█▋        | 80/468 [00:04<00:18, 21.03it/s]
avg reward: 0.77:  17%|█▋        | 80/468 [00:05<00:18, 21.03it/s]
avg reward: 0.77:  17%|█▋        | 80/468 [00:05<00:18, 21.03it/s]
avg reward: 0.77:  17%|█▋        | 80/468 [00:05<00:18, 21.03it/s]
avg reward: 0.77:  18%|█▊        | 83/468 [00:05<00:21, 17.99it/s]
avg reward: 0.77:  18%|█▊        | 83/468 [00:05<00:21, 17.99it/s]
avg reward: 0.77:  18%|█▊        | 83/468 [00:05<00:21, 17.99it/s]
avg reward: 0.82:  18%|█▊        | 83/468 [00:05<00:21, 17.99it/s]
avg reward: 0.82:  18%|█▊        | 86/468 [00:05<00:23, 16.47it/s]
avg reward: 0.82:  18%|█▊        | 86/468 [00:05<00:23, 16.47it/s]
avg reward: 0.82:  18%|█▊        | 86/468 [00:05<00:23, 16.47it/s]
avg reward: 0.82:  18%|█▊        | 86/468 [00:05<00:23, 16.47it/s]
avg reward: 0.82:  18%|█▊        | 86/468 [00:05<00:23, 16.47it/s]
avg reward: 0.82:  19%|█▉        | 90/468 [00:05<00:18, 20.22it/s]
avg reward: 0.84:  19%|█▉        | 90/468 [00:05<00:18, 20.22it/s]
avg reward: 0.84:  19%|█▉        | 90/468 [00:05<00:18, 20.22it/s]
avg reward: 0.84:  19%|█▉        | 90/468 [00:05<00:18, 20.22it/s]
avg reward: 0.84:  20%|█▉        | 93/468 [00:05<00:21, 17.75it/s]
avg reward: 0.84:  20%|█▉        | 93/468 [00:05<00:21, 17.75it/s]
avg reward: 0.84:  20%|█▉        | 93/468 [00:05<00:21, 17.75it/s]
avg reward: 0.84:  20%|█▉        | 93/468 [00:05<00:21, 17.75it/s]
avg reward: 0.84:  21%|██        | 96/468 [00:05<00:22, 16.56it/s]
avg reward: 0.84:  21%|██        | 96/468 [00:05<00:22, 16.56it/s]
avg reward: 0.84:  21%|██        | 96/468 [00:05<00:22, 16.56it/s]
avg reward: 0.84:  21%|██        | 96/468 [00:05<00:22, 16.56it/s]
avg reward: 0.84:  21%|██        | 96/468 [00:05<00:22, 16.56it/s]
avg reward: 0.84:  21%|██▏       | 100/468 [00:05<00:18, 20.26it/s]
avg reward: 0.87:  21%|██▏       | 100/468 [00:06<00:18, 20.26it/s]
avg reward: 0.87:  21%|██▏       | 100/468 [00:06<00:18, 20.26it/s]
avg reward: 0.87:  21%|██▏       | 100/468 [00:06<00:18, 20.26it/s]
avg reward: 0.87:  22%|██▏       | 103/468 [00:06<00:21, 17.15it/s]
avg reward: 0.87:  22%|██▏       | 103/468 [00:06<00:21, 17.15it/s]
avg reward: 0.87:  22%|██▏       | 103/468 [00:06<00:21, 17.15it/s]
avg reward: 0.89:  22%|██▏       | 103/468 [00:06<00:21, 17.15it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:22, 16.05it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:22, 16.05it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:22, 16.05it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:22, 16.05it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:22, 16.05it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 19.96it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 19.96it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 19.96it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 19.96it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.45it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.45it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.45it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.45it/s]
avg reward: 0.89:  25%|██▍       | 116/468 [00:06<00:22, 15.86it/s]
avg reward: 0.89:  25%|██▍       | 116/468 [00:07<00:22, 15.86it/s]
avg reward: 0.89:  25%|██▍       | 116/468 [00:07<00:22, 15.86it/s]
avg reward: 0.89:  25%|██▍       | 116/468 [00:07<00:22, 15.86it/s]
avg reward: 0.89:  25%|██▍       | 116/468 [00:07<00:22, 15.86it/s]
avg reward: 0.89:  26%|██▌       | 120/468 [00:07<00:17, 19.93it/s]
avg reward: 0.89:  26%|██▌       | 120/468 [00:07<00:17, 19.93it/s]
avg reward: 0.89:  26%|██▌       | 120/468 [00:07<00:17, 19.93it/s]
avg reward: 0.89:  26%|██▌       | 120/468 [00:07<00:17, 19.93it/s]
avg reward: 0.89:  26%|██▋       | 123/468 [00:07<00:20, 17.17it/s]
avg reward: 0.89:  26%|██▋       | 123/468 [00:07<00:20, 17.17it/s]
avg reward: 0.89:  26%|██▋       | 123/468 [00:07<00:20, 17.17it/s]
avg reward: 0.88:  26%|██▋       | 123/468 [00:07<00:20, 17.17it/s]
avg reward: 0.88:  27%|██▋       | 126/468 [00:07<00:22, 15.54it/s]
avg reward: 0.88:  27%|██▋       | 126/468 [00:07<00:22, 15.54it/s]
avg reward: 0.88:  27%|██▋       | 126/468 [00:07<00:22, 15.54it/s]
avg reward: 0.88:  27%|██▋       | 126/468 [00:07<00:22, 15.54it/s]
avg reward: 0.88:  27%|██▋       | 126/468 [00:07<00:22, 15.54it/s]
avg reward: 0.88:  28%|██▊       | 130/468 [00:07<00:17, 19.53it/s]
avg reward: 0.89:  28%|██▊       | 130/468 [00:07<00:17, 19.53it/s]
avg reward: 0.89:  28%|██▊       | 130/468 [00:07<00:17, 19.53it/s]
avg reward: 0.89:  28%|██▊       | 130/468 [00:07<00:17, 19.53it/s]
avg reward: 0.89:  28%|██▊       | 133/468 [00:07<00:20, 16.54it/s]
avg reward: 0.89:  28%|██▊       | 133/468 [00:07<00:20, 16.54it/s]
avg reward: 0.89:  28%|██▊       | 133/468 [00:07<00:20, 16.54it/s]
avg reward: 0.90:  28%|██▊       | 133/468 [00:08<00:20, 16.54it/s]
avg reward: 0.90:  29%|██▉       | 136/468 [00:08<00:21, 15.66it/s]
avg reward: 0.90:  29%|██▉       | 136/468 [00:08<00:21, 15.66it/s]
avg reward: 0.90:  29%|██▉       | 136/468 [00:08<00:21, 15.66it/s]
avg reward: 0.90:  29%|██▉       | 136/468 [00:08<00:21, 15.66it/s]
avg reward: 0.90:  29%|██▉       | 136/468 [00:08<00:21, 15.66it/s]
avg reward: 0.90:  30%|██▉       | 140/468 [00:08<00:16, 19.74it/s]
avg reward: 0.90:  30%|██▉       | 140/468 [00:08<00:16, 19.74it/s]
avg reward: 0.90:  30%|██▉       | 140/468 [00:08<00:16, 19.74it/s]
avg reward: 0.90:  30%|██▉       | 140/468 [00:08<00:16, 19.74it/s]
avg reward: 0.90:  31%|███       | 143/468 [00:08<00:18, 17.42it/s]
avg reward: 0.90:  31%|███       | 143/468 [00:08<00:18, 17.42it/s]
avg reward: 0.90:  31%|███       | 143/468 [00:08<00:18, 17.42it/s]
avg reward: 0.92:  31%|███       | 143/468 [00:08<00:18, 17.42it/s]
avg reward: 0.92:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.92:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.92:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.92:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.92:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.93:  31%|███       | 146/468 [00:08<00:20, 15.86it/s]
avg reward: 0.93:  32%|███▏      | 151/468 [00:08<00:18, 17.01it/s]
avg reward: 0.93:  32%|███▏      | 151/468 [00:08<00:18, 17.01it/s]
avg reward: 0.93:  32%|███▏      | 151/468 [00:09<00:18, 17.01it/s]
avg reward: 0.93:  32%|███▏      | 151/468 [00:09<00:18, 17.01it/s]
avg reward: 0.93:  32%|███▏      | 151/468 [00:09<00:18, 17.01it/s]
avg reward: 0.93:  33%|███▎      | 155/468 [00:09<00:15, 20.72it/s]
avg reward: 0.93:  33%|███▎      | 155/468 [00:09<00:15, 20.72it/s]
avg reward: 0.93:  33%|███▎      | 155/468 [00:09<00:15, 20.72it/s]
avg reward: 0.93:  33%|███▎      | 155/468 [00:09<00:15, 20.72it/s]
avg reward: 0.93:  34%|███▍      | 158/468 [00:09<00:17, 17.55it/s]
avg reward: 0.93:  34%|███▍      | 158/468 [00:09<00:17, 17.55it/s]
avg reward: 0.93:  34%|███▍      | 158/468 [00:09<00:17, 17.55it/s]
avg reward: 0.94:  34%|███▍      | 158/468 [00:09<00:17, 17.55it/s]
avg reward: 0.94:  34%|███▍      | 161/468 [00:09<00:18, 16.35it/s]
avg reward: 0.94:  34%|███▍      | 161/468 [00:09<00:18, 16.35it/s]
avg reward: 0.94:  34%|███▍      | 161/468 [00:09<00:18, 16.35it/s]
avg reward: 0.94:  34%|███▍      | 161/468 [00:09<00:18, 16.35it/s]
avg reward: 0.94:  34%|███▍      | 161/468 [00:09<00:18, 16.35it/s]
avg reward: 0.94:  35%|███▌      | 165/468 [00:09<00:15, 20.12it/s]
avg reward: 0.95:  35%|███▌      | 165/468 [00:09<00:15, 20.12it/s]
avg reward: 0.95:  35%|███▌      | 165/468 [00:09<00:15, 20.12it/s]
avg reward: 0.95:  35%|███▌      | 165/468 [00:09<00:15, 20.12it/s]
avg reward: 0.95:  36%|███▌      | 168/468 [00:09<00:17, 17.61it/s]
avg reward: 0.95:  36%|███▌      | 168/468 [00:09<00:17, 17.61it/s]
avg reward: 0.95:  36%|███▌      | 168/468 [00:09<00:17, 17.61it/s]
avg reward: 0.95:  36%|███▌      | 168/468 [00:10<00:17, 17.61it/s]
avg reward: 0.95:  37%|███▋      | 171/468 [00:10<00:18, 16.36it/s]
avg reward: 0.95:  37%|███▋      | 171/468 [00:10<00:18, 16.36it/s]
avg reward: 0.95:  37%|███▋      | 171/468 [00:10<00:18, 16.36it/s]
avg reward: 0.95:  37%|███▋      | 171/468 [00:10<00:18, 16.36it/s]
avg reward: 0.95:  37%|███▋      | 171/468 [00:10<00:18, 16.36it/s]
avg reward: 0.95:  37%|███▋      | 175/468 [00:10<00:14, 20.27it/s]
avg reward: 0.95:  37%|███▋      | 175/468 [00:10<00:14, 20.27it/s]
avg reward: 0.95:  37%|███▋      | 175/468 [00:10<00:14, 20.27it/s]
avg reward: 0.95:  37%|███▋      | 175/468 [00:10<00:14, 20.27it/s]
avg reward: 0.95:  38%|███▊      | 178/468 [00:10<00:16, 17.40it/s]
avg reward: 0.95:  38%|███▊      | 178/468 [00:10<00:16, 17.40it/s]
avg reward: 0.95:  38%|███▊      | 178/468 [00:10<00:16, 17.40it/s]
avg reward: 0.95:  38%|███▊      | 178/468 [00:10<00:16, 17.40it/s]
avg reward: 0.95:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.95:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.95:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.95:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.95:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.96:  39%|███▊      | 181/468 [00:10<00:18, 15.90it/s]
avg reward: 0.96:  40%|███▉      | 186/468 [00:10<00:16, 16.94it/s]
avg reward: 0.96:  40%|███▉      | 186/468 [00:10<00:16, 16.94it/s]
avg reward: 0.96:  40%|███▉      | 186/468 [00:10<00:16, 16.94it/s]
avg reward: 0.96:  40%|███▉      | 186/468 [00:11<00:16, 16.94it/s]
avg reward: 0.96:  40%|███▉      | 186/468 [00:11<00:16, 16.94it/s]
avg reward: 0.96:  41%|████      | 190/468 [00:11<00:13, 20.55it/s]
avg reward: 0.96:  41%|████      | 190/468 [00:11<00:13, 20.55it/s]
avg reward: 0.96:  41%|████      | 190/468 [00:11<00:13, 20.55it/s]
avg reward: 0.96:  41%|████      | 190/468 [00:11<00:13, 20.55it/s]
avg reward: 0.96:  41%|████      | 193/468 [00:11<00:15, 18.09it/s]
avg reward: 0.96:  41%|████      | 193/468 [00:11<00:15, 18.09it/s]
avg reward: 0.96:  41%|████      | 193/468 [00:11<00:15, 18.09it/s]
avg reward: 0.95:  41%|████      | 193/468 [00:11<00:15, 18.09it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  42%|████▏     | 196/468 [00:11<00:16, 16.33it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:11<00:15, 16.85it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:11<00:15, 16.85it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:11<00:15, 16.85it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:11<00:15, 16.85it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:11<00:15, 16.85it/s]
avg reward: 0.95:  43%|████▎     | 201/468 [00:12<00:15, 16.85it/s]
avg reward: 0.95:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.95:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.95:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.95:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.95:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.96:  44%|████▍     | 206/468 [00:12<00:15, 17.45it/s]
avg reward: 0.96:  45%|████▌     | 211/468 [00:12<00:14, 17.49it/s]
avg reward: 0.96:  45%|████▌     | 211/468 [00:12<00:14, 17.49it/s]
avg reward: 0.96:  45%|████▌     | 211/468 [00:12<00:14, 17.49it/s]
avg reward: 0.96:  45%|████▌     | 211/468 [00:12<00:14, 17.49it/s]
avg reward: 0.96:  45%|████▌     | 211/468 [00:12<00:14, 17.49it/s]
avg reward: 0.96:  46%|████▌     | 215/468 [00:12<00:12, 20.51it/s]
avg reward: 0.96:  46%|████▌     | 215/468 [00:12<00:12, 20.51it/s]
avg reward: 0.96:  46%|████▌     | 215/468 [00:12<00:12, 20.51it/s]
avg reward: 0.96:  46%|████▌     | 215/468 [00:12<00:12, 20.51it/s]
avg reward: 0.96:  47%|████▋     | 218/468 [00:12<00:14, 17.33it/s]
avg reward: 0.96:  47%|████▋     | 218/468 [00:12<00:14, 17.33it/s]
avg reward: 0.96:  47%|████▋     | 218/468 [00:12<00:14, 17.33it/s]
avg reward: 0.97:  47%|████▋     | 218/468 [00:12<00:14, 17.33it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:12<00:15, 15.96it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:12<00:15, 15.96it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:12<00:15, 15.96it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:13<00:15, 15.96it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:13<00:15, 15.96it/s]
avg reward: 0.97:  47%|████▋     | 221/468 [00:13<00:15, 15.96it/s]
avg reward: 0.97:  48%|████▊     | 226/468 [00:13<00:14, 16.73it/s]
avg reward: 0.97:  48%|████▊     | 226/468 [00:13<00:14, 16.73it/s]
avg reward: 0.97:  48%|████▊     | 226/468 [00:13<00:14, 16.73it/s]
avg reward: 0.97:  48%|████▊     | 226/468 [00:13<00:14, 16.73it/s]
avg reward: 0.97:  48%|████▊     | 226/468 [00:13<00:14, 16.73it/s]
avg reward: 0.97:  49%|████▉     | 230/468 [00:13<00:11, 20.06it/s]
avg reward: 0.96:  49%|████▉     | 230/468 [00:13<00:11, 20.06it/s]
avg reward: 0.96:  49%|████▉     | 230/468 [00:13<00:11, 20.06it/s]
avg reward: 0.96:  49%|████▉     | 230/468 [00:13<00:11, 20.06it/s]
avg reward: 0.96:  50%|████▉     | 233/468 [00:13<00:13, 17.53it/s]
avg reward: 0.96:  50%|████▉     | 233/468 [00:13<00:13, 17.53it/s]
avg reward: 0.96:  50%|████▉     | 233/468 [00:13<00:13, 17.53it/s]
avg reward: 0.97:  50%|████▉     | 233/468 [00:13<00:13, 17.53it/s]
avg reward: 0.97:  50%|█████     | 236/468 [00:13<00:14, 16.11it/s]
avg reward: 0.97:  50%|█████     | 236/468 [00:13<00:14, 16.11it/s]
avg reward: 0.97:  50%|█████     | 236/468 [00:13<00:14, 16.11it/s]
avg reward: 0.97:  50%|█████     | 236/468 [00:13<00:14, 16.11it/s]
avg reward: 0.97:  50%|█████     | 236/468 [00:13<00:14, 16.11it/s]
avg reward: 0.97:  51%|█████▏    | 240/468 [00:13<00:11, 19.93it/s]
avg reward: 0.97:  51%|█████▏    | 240/468 [00:14<00:11, 19.93it/s]
avg reward: 0.97:  51%|█████▏    | 240/468 [00:14<00:11, 19.93it/s]
avg reward: 0.97:  51%|█████▏    | 240/468 [00:14<00:11, 19.93it/s]
avg reward: 0.97:  52%|█████▏    | 243/468 [00:14<00:12, 17.63it/s]
avg reward: 0.97:  52%|█████▏    | 243/468 [00:14<00:12, 17.63it/s]
avg reward: 0.97:  52%|█████▏    | 243/468 [00:14<00:12, 17.63it/s]
avg reward: 0.97:  52%|█████▏    | 243/468 [00:14<00:12, 17.63it/s]
avg reward: 0.97:  53%|█████▎    | 246/468 [00:14<00:14, 15.45it/s]
avg reward: 0.97:  53%|█████▎    | 246/468 [00:14<00:14, 15.45it/s]
avg reward: 0.97:  53%|█████▎    | 246/468 [00:14<00:14, 15.45it/s]
avg reward: 0.97:  53%|█████▎    | 246/468 [00:14<00:14, 15.45it/s]
avg reward: 0.97:  53%|█████▎    | 246/468 [00:14<00:14, 15.45it/s]
avg reward: 0.97:  53%|█████▎    | 250/468 [00:14<00:11, 19.35it/s]
avg reward: 0.97:  53%|█████▎    | 250/468 [00:14<00:11, 19.35it/s]
avg reward: 0.97:  53%|█████▎    | 250/468 [00:14<00:11, 19.35it/s]
avg reward: 0.97:  53%|█████▎    | 250/468 [00:14<00:11, 19.35it/s]
avg reward: 0.97:  54%|█████▍    | 253/468 [00:14<00:12, 16.88it/s]
avg reward: 0.97:  54%|█████▍    | 253/468 [00:14<00:12, 16.88it/s]
avg reward: 0.97:  54%|█████▍    | 253/468 [00:14<00:12, 16.88it/s]
avg reward: 0.97:  54%|█████▍    | 253/468 [00:14<00:12, 16.88it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:14<00:13, 15.60it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:14<00:13, 15.60it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:15<00:13, 15.60it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:15<00:13, 15.60it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:15<00:13, 15.60it/s]
avg reward: 0.97:  55%|█████▍    | 256/468 [00:15<00:13, 15.60it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  56%|█████▌    | 261/468 [00:15<00:12, 16.74it/s]
avg reward: 0.97:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.97:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.97:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.97:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.97:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.98:  57%|█████▋    | 266/468 [00:15<00:11, 17.12it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:15<00:11, 17.43it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:15<00:11, 17.43it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:15<00:11, 17.43it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:15<00:11, 17.43it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:15<00:11, 17.43it/s]
avg reward: 0.98:  58%|█████▊    | 271/468 [00:16<00:11, 17.43it/s]
avg reward: 0.98:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.98:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.98:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.98:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.98:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.97:  59%|█████▉    | 276/468 [00:16<00:10, 17.86it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  60%|██████    | 281/468 [00:16<00:10, 17.94it/s]
avg reward: 0.97:  61%|██████    | 286/468 [00:16<00:10, 18.03it/s]
avg reward: 0.97:  61%|██████    | 286/468 [00:16<00:10, 18.03it/s]
avg reward: 0.97:  61%|██████    | 286/468 [00:16<00:10, 18.03it/s]
avg reward: 0.97:  61%|██████    | 286/468 [00:16<00:10, 18.03it/s]
avg reward: 0.97:  61%|██████    | 286/468 [00:16<00:10, 18.03it/s]
avg reward: 0.97:  62%|██████▏   | 290/468 [00:16<00:08, 21.02it/s]
avg reward: 0.98:  62%|██████▏   | 290/468 [00:16<00:08, 21.02it/s]
avg reward: 0.98:  62%|██████▏   | 290/468 [00:16<00:08, 21.02it/s]
avg reward: 0.98:  62%|██████▏   | 290/468 [00:16<00:08, 21.02it/s]
avg reward: 0.98:  63%|██████▎   | 293/468 [00:16<00:09, 17.95it/s]
avg reward: 0.98:  63%|██████▎   | 293/468 [00:16<00:09, 17.95it/s]
avg reward: 0.98:  63%|██████▎   | 293/468 [00:16<00:09, 17.95it/s]
avg reward: 0.98:  63%|██████▎   | 293/468 [00:17<00:09, 17.95it/s]
avg reward: 0.98:  63%|██████▎   | 296/468 [00:17<00:10, 16.94it/s]
avg reward: 0.98:  63%|██████▎   | 296/468 [00:17<00:10, 16.94it/s]
avg reward: 0.98:  63%|██████▎   | 296/468 [00:17<00:10, 16.94it/s]
avg reward: 0.98:  63%|██████▎   | 296/468 [00:17<00:10, 16.94it/s]
avg reward: 0.98:  63%|██████▎   | 296/468 [00:17<00:10, 16.94it/s]
avg reward: 0.98:  64%|██████▍   | 300/468 [00:17<00:08, 20.52it/s]
avg reward: 0.98:  64%|██████▍   | 300/468 [00:17<00:08, 20.52it/s]
avg reward: 0.98:  64%|██████▍   | 300/468 [00:17<00:08, 20.52it/s]
avg reward: 0.98:  64%|██████▍   | 300/468 [00:17<00:08, 20.52it/s]
avg reward: 0.98:  65%|██████▍   | 303/468 [00:17<00:08, 18.39it/s]
avg reward: 0.98:  65%|██████▍   | 303/468 [00:17<00:08, 18.39it/s]
avg reward: 0.98:  65%|██████▍   | 303/468 [00:17<00:08, 18.39it/s]
avg reward: 0.99:  65%|██████▍   | 303/468 [00:17<00:08, 18.39it/s]
avg reward: 0.99:  65%|██████▌   | 306/468 [00:17<00:09, 16.72it/s]
avg reward: 0.99:  65%|██████▌   | 306/468 [00:17<00:09, 16.72it/s]
avg reward: 0.99:  65%|██████▌   | 306/468 [00:17<00:09, 16.72it/s]
avg reward: 0.99:  65%|██████▌   | 306/468 [00:17<00:09, 16.72it/s]
avg reward: 0.99:  65%|██████▌   | 306/468 [00:17<00:09, 16.72it/s]
avg reward: 0.99:  66%|██████▌   | 310/468 [00:17<00:07, 20.74it/s]
avg reward: 0.99:  66%|██████▌   | 310/468 [00:17<00:07, 20.74it/s]
avg reward: 0.99:  66%|██████▌   | 310/468 [00:18<00:07, 20.74it/s]
avg reward: 0.99:  66%|██████▌   | 310/468 [00:18<00:07, 20.74it/s]
avg reward: 0.99:  67%|██████▋   | 313/468 [00:18<00:08, 17.68it/s]
avg reward: 0.99:  67%|██████▋   | 313/468 [00:18<00:08, 17.68it/s]
avg reward: 0.99:  67%|██████▋   | 313/468 [00:18<00:08, 17.68it/s]
avg reward: 0.99:  67%|██████▋   | 313/468 [00:18<00:08, 17.68it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  68%|██████▊   | 316/468 [00:18<00:09, 16.48it/s]
avg reward: 0.99:  69%|██████▊   | 321/468 [00:18<00:08, 17.44it/s]
avg reward: 0.99:  69%|██████▊   | 321/468 [00:18<00:08, 17.44it/s]
avg reward: 0.99:  69%|██████▊   | 321/468 [00:18<00:08, 17.44it/s]
avg reward: 0.99:  69%|██████▊   | 321/468 [00:18<00:08, 17.44it/s]
avg reward: 0.99:  69%|██████▊   | 321/468 [00:18<00:08, 17.44it/s]
avg reward: 0.99:  69%|██████▉   | 325/468 [00:18<00:06, 20.96it/s]
avg reward: 0.99:  69%|██████▉   | 325/468 [00:18<00:06, 20.96it/s]
avg reward: 0.99:  69%|██████▉   | 325/468 [00:18<00:06, 20.96it/s]
avg reward: 0.99:  69%|██████▉   | 325/468 [00:18<00:06, 20.96it/s]
avg reward: 0.99:  70%|███████   | 328/468 [00:18<00:07, 18.01it/s]
avg reward: 0.99:  70%|███████   | 328/468 [00:18<00:07, 18.01it/s]
avg reward: 0.99:  70%|███████   | 328/468 [00:18<00:07, 18.01it/s]
avg reward: 0.99:  70%|███████   | 328/468 [00:19<00:07, 18.01it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  71%|███████   | 331/468 [00:19<00:08, 16.76it/s]
avg reward: 0.99:  72%|███████▏  | 336/468 [00:19<00:08, 16.44it/s]
avg reward: 0.99:  72%|███████▏  | 336/468 [00:19<00:08, 16.44it/s]
avg reward: 0.99:  72%|███████▏  | 336/468 [00:19<00:08, 16.44it/s]
avg reward: 0.99:  72%|███████▏  | 336/468 [00:19<00:08, 16.44it/s]
avg reward: 0.99:  72%|███████▏  | 336/468 [00:19<00:08, 16.44it/s]
avg reward: 0.99:  73%|███████▎  | 340/468 [00:19<00:06, 20.00it/s]
avg reward: 0.99:  73%|███████▎  | 340/468 [00:19<00:06, 20.00it/s]
avg reward: 0.99:  73%|███████▎  | 340/468 [00:19<00:06, 20.00it/s]
avg reward: 0.99:  73%|███████▎  | 340/468 [00:19<00:06, 20.00it/s]
avg reward: 0.99:  73%|███████▎  | 343/468 [00:19<00:07, 17.76it/s]
avg reward: 0.99:  73%|███████▎  | 343/468 [00:19<00:07, 17.76it/s]
avg reward: 0.99:  73%|███████▎  | 343/468 [00:19<00:07, 17.76it/s]
avg reward: 0.99:  73%|███████▎  | 343/468 [00:19<00:07, 17.76it/s]
avg reward: 0.99:  74%|███████▍  | 346/468 [00:19<00:07, 16.89it/s]
avg reward: 0.99:  74%|███████▍  | 346/468 [00:19<00:07, 16.89it/s]
avg reward: 0.99:  74%|███████▍  | 346/468 [00:19<00:07, 16.89it/s]
avg reward: 0.99:  74%|███████▍  | 346/468 [00:20<00:07, 16.89it/s]
avg reward: 0.99:  74%|███████▍  | 346/468 [00:20<00:07, 16.89it/s]
avg reward: 0.99:  75%|███████▍  | 350/468 [00:20<00:05, 20.41it/s]
avg reward: 0.99:  75%|███████▍  | 350/468 [00:20<00:05, 20.41it/s]
avg reward: 0.99:  75%|███████▍  | 350/468 [00:20<00:05, 20.41it/s]
avg reward: 0.99:  75%|███████▍  | 350/468 [00:20<00:05, 20.41it/s]
avg reward: 0.99:  75%|███████▌  | 353/468 [00:20<00:06, 17.78it/s]
avg reward: 0.99:  75%|███████▌  | 353/468 [00:20<00:06, 17.78it/s]
avg reward: 0.99:  75%|███████▌  | 353/468 [00:20<00:06, 17.78it/s]
avg reward: 0.99:  75%|███████▌  | 353/468 [00:20<00:06, 17.78it/s]
avg reward: 0.99:  76%|███████▌  | 356/468 [00:20<00:06, 16.39it/s]
avg reward: 0.99:  76%|███████▌  | 356/468 [00:20<00:06, 16.39it/s]
avg reward: 0.99:  76%|███████▌  | 356/468 [00:20<00:06, 16.39it/s]
avg reward: 0.99:  76%|███████▌  | 356/468 [00:20<00:06, 16.39it/s]
avg reward: 0.99:  76%|███████▌  | 356/468 [00:20<00:06, 16.39it/s]
avg reward: 0.99:  77%|███████▋  | 360/468 [00:20<00:05, 20.35it/s]
avg reward: 0.99:  77%|███████▋  | 360/468 [00:20<00:05, 20.35it/s]
avg reward: 0.99:  77%|███████▋  | 360/468 [00:20<00:05, 20.35it/s]
avg reward: 0.99:  77%|███████▋  | 360/468 [00:20<00:05, 20.35it/s]
avg reward: 0.99:  78%|███████▊  | 363/468 [00:20<00:06, 17.22it/s]
avg reward: 0.99:  78%|███████▊  | 363/468 [00:20<00:06, 17.22it/s]
avg reward: 0.99:  78%|███████▊  | 363/468 [00:20<00:06, 17.22it/s]
avg reward: 0.99:  78%|███████▊  | 363/468 [00:21<00:06, 17.22it/s]
avg reward: 0.99:  78%|███████▊  | 366/468 [00:21<00:06, 15.87it/s]
avg reward: 0.99:  78%|███████▊  | 366/468 [00:21<00:06, 15.87it/s]
avg reward: 0.99:  78%|███████▊  | 366/468 [00:21<00:06, 15.87it/s]
avg reward: 0.99:  78%|███████▊  | 366/468 [00:21<00:06, 15.87it/s]
avg reward: 0.99:  78%|███████▊  | 366/468 [00:21<00:06, 15.87it/s]
avg reward: 0.99:  79%|███████▉  | 370/468 [00:21<00:04, 20.03it/s]
avg reward: 0.99:  79%|███████▉  | 370/468 [00:21<00:04, 20.03it/s]
avg reward: 0.99:  79%|███████▉  | 370/468 [00:21<00:04, 20.03it/s]
avg reward: 0.99:  79%|███████▉  | 370/468 [00:21<00:04, 20.03it/s]
avg reward: 0.99:  80%|███████▉  | 373/468 [00:21<00:05, 17.64it/s]
avg reward: 0.99:  80%|███████▉  | 373/468 [00:21<00:05, 17.64it/s]
avg reward: 0.99:  80%|███████▉  | 373/468 [00:21<00:05, 17.64it/s]
avg reward: 0.99:  80%|███████▉  | 373/468 [00:21<00:05, 17.64it/s]
avg reward: 0.99:  80%|████████  | 376/468 [00:21<00:05, 16.25it/s]
avg reward: 0.99:  80%|████████  | 376/468 [00:21<00:05, 16.25it/s]
avg reward: 0.99:  80%|████████  | 376/468 [00:21<00:05, 16.25it/s]
avg reward: 0.99:  80%|████████  | 376/468 [00:21<00:05, 16.25it/s]
avg reward: 0.99:  80%|████████  | 376/468 [00:21<00:05, 16.25it/s]
avg reward: 0.99:  81%|████████  | 380/468 [00:21<00:04, 20.44it/s]
avg reward: 1.00:  81%|████████  | 380/468 [00:21<00:04, 20.44it/s]
avg reward: 1.00:  81%|████████  | 380/468 [00:21<00:04, 20.44it/s]
avg reward: 1.00:  81%|████████  | 380/468 [00:21<00:04, 20.44it/s]
avg reward: 1.00:  82%|████████▏ | 383/468 [00:21<00:04, 17.85it/s]
avg reward: 1.00:  82%|████████▏ | 383/468 [00:21<00:04, 17.85it/s]
avg reward: 1.00:  82%|████████▏ | 383/468 [00:21<00:04, 17.85it/s]
avg reward: 1.00:  82%|████████▏ | 383/468 [00:22<00:04, 17.85it/s]
avg reward: 1.00:  82%|████████▏ | 386/468 [00:22<00:05, 16.10it/s]
avg reward: 1.00:  82%|████████▏ | 386/468 [00:22<00:05, 16.10it/s]
avg reward: 1.00:  82%|████████▏ | 386/468 [00:22<00:05, 16.10it/s]
avg reward: 1.00:  82%|████████▏ | 386/468 [00:22<00:05, 16.10it/s]
avg reward: 1.00:  82%|████████▏ | 386/468 [00:22<00:05, 16.10it/s]
avg reward: 1.00:  83%|████████▎ | 390/468 [00:22<00:03, 20.30it/s]
avg reward: 1.00:  83%|████████▎ | 390/468 [00:22<00:03, 20.30it/s]
avg reward: 1.00:  83%|████████▎ | 390/468 [00:22<00:03, 20.30it/s]
avg reward: 1.00:  83%|████████▎ | 390/468 [00:22<00:03, 20.30it/s]
avg reward: 1.00:  84%|████████▍ | 393/468 [00:22<00:04, 17.74it/s]
avg reward: 1.00:  84%|████████▍ | 393/468 [00:22<00:04, 17.74it/s]
avg reward: 1.00:  84%|████████▍ | 393/468 [00:22<00:04, 17.74it/s]
avg reward: 1.00:  84%|████████▍ | 393/468 [00:22<00:04, 17.74it/s]
avg reward: 1.00:  85%|████████▍ | 396/468 [00:22<00:04, 15.98it/s]
avg reward: 1.00:  85%|████████▍ | 396/468 [00:22<00:04, 15.98it/s]
avg reward: 1.00:  85%|████████▍ | 396/468 [00:22<00:04, 15.98it/s]
avg reward: 1.00:  85%|████████▍ | 396/468 [00:22<00:04, 15.98it/s]
avg reward: 1.00:  85%|████████▍ | 396/468 [00:22<00:04, 15.98it/s]
avg reward: 1.01:  85%|████████▍ | 396/468 [00:23<00:04, 15.98it/s]
avg reward: 1.01:  86%|████████▌ | 401/468 [00:23<00:04, 16.56it/s]
avg reward: 1.01:  86%|████████▌ | 401/468 [00:23<00:04, 16.56it/s]
avg reward: 1.01:  86%|████████▌ | 401/468 [00:23<00:04, 16.56it/s]
avg reward: 1.01:  86%|████████▌ | 401/468 [00:23<00:04, 16.56it/s]
avg reward: 1.01:  86%|████████▌ | 401/468 [00:23<00:04, 16.56it/s]
avg reward: 1.01:  87%|████████▋ | 405/468 [00:23<00:03, 19.75it/s]
avg reward: 1.01:  87%|████████▋ | 405/468 [00:23<00:03, 19.75it/s]
avg reward: 1.01:  87%|████████▋ | 405/468 [00:23<00:03, 19.75it/s]
avg reward: 1.01:  87%|████████▋ | 405/468 [00:23<00:03, 19.75it/s]
avg reward: 1.01:  87%|████████▋ | 408/468 [00:23<00:03, 17.60it/s]
avg reward: 1.01:  87%|████████▋ | 408/468 [00:23<00:03, 17.60it/s]
avg reward: 1.01:  87%|████████▋ | 408/468 [00:23<00:03, 17.60it/s]
avg reward: 1.01:  87%|████████▋ | 408/468 [00:23<00:03, 17.60it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  88%|████████▊ | 411/468 [00:23<00:03, 15.84it/s]
avg reward: 1.01:  89%|████████▉ | 416/468 [00:23<00:03, 16.45it/s]
avg reward: 1.01:  89%|████████▉ | 416/468 [00:23<00:03, 16.45it/s]
avg reward: 1.01:  89%|████████▉ | 416/468 [00:23<00:03, 16.45it/s]
avg reward: 1.01:  89%|████████▉ | 416/468 [00:23<00:03, 16.45it/s]
avg reward: 1.01:  89%|████████▉ | 416/468 [00:23<00:03, 16.45it/s]
avg reward: 1.01:  90%|████████▉ | 420/468 [00:23<00:02, 19.82it/s]
avg reward: 1.01:  90%|████████▉ | 420/468 [00:24<00:02, 19.82it/s]
avg reward: 1.01:  90%|████████▉ | 420/468 [00:24<00:02, 19.82it/s]
avg reward: 1.01:  90%|████████▉ | 420/468 [00:24<00:02, 19.82it/s]
avg reward: 1.01:  90%|█████████ | 423/468 [00:24<00:02, 17.79it/s]
avg reward: 1.01:  90%|█████████ | 423/468 [00:24<00:02, 17.79it/s]
avg reward: 1.01:  90%|█████████ | 423/468 [00:24<00:02, 17.79it/s]
avg reward: 1.01:  90%|█████████ | 423/468 [00:24<00:02, 17.79it/s]
avg reward: 1.01:  91%|█████████ | 426/468 [00:24<00:02, 16.47it/s]
avg reward: 1.01:  91%|█████████ | 426/468 [00:24<00:02, 16.47it/s]
avg reward: 1.01:  91%|█████████ | 426/468 [00:24<00:02, 16.47it/s]
avg reward: 1.01:  91%|█████████ | 426/468 [00:24<00:02, 16.47it/s]
avg reward: 1.01:  91%|█████████ | 426/468 [00:24<00:02, 16.47it/s]
avg reward: 1.01:  92%|█████████▏| 430/468 [00:24<00:01, 20.44it/s]
avg reward: 1.01:  92%|█████████▏| 430/468 [00:24<00:01, 20.44it/s]
avg reward: 1.01:  92%|█████████▏| 430/468 [00:24<00:01, 20.44it/s]
avg reward: 1.01:  92%|█████████▏| 430/468 [00:24<00:01, 20.44it/s]
avg reward: 1.01:  93%|█████████▎| 433/468 [00:24<00:01, 17.96it/s]
avg reward: 1.01:  93%|█████████▎| 433/468 [00:24<00:01, 17.96it/s]
avg reward: 1.01:  93%|█████████▎| 433/468 [00:24<00:01, 17.96it/s]
avg reward: 1.01:  93%|█████████▎| 433/468 [00:24<00:01, 17.96it/s]
avg reward: 1.01:  93%|█████████▎| 436/468 [00:24<00:01, 16.25it/s]
avg reward: 1.01:  93%|█████████▎| 436/468 [00:25<00:01, 16.25it/s]
avg reward: 1.01:  93%|█████████▎| 436/468 [00:25<00:01, 16.25it/s]
avg reward: 1.01:  93%|█████████▎| 436/468 [00:25<00:01, 16.25it/s]
avg reward: 1.01:  93%|█████████▎| 436/468 [00:25<00:01, 16.25it/s]
avg reward: 1.02:  93%|█████████▎| 436/468 [00:25<00:01, 16.25it/s]
avg reward: 1.02:  94%|█████████▍| 441/468 [00:25<00:01, 16.34it/s]
avg reward: 1.02:  94%|█████████▍| 441/468 [00:25<00:01, 16.34it/s]
avg reward: 1.02:  94%|█████████▍| 441/468 [00:25<00:01, 16.34it/s]
avg reward: 1.02:  94%|█████████▍| 441/468 [00:25<00:01, 16.34it/s]
avg reward: 1.02:  94%|█████████▍| 441/468 [00:25<00:01, 16.34it/s]
avg reward: 1.02:  95%|█████████▌| 445/468 [00:25<00:01, 19.99it/s]
avg reward: 1.02:  95%|█████████▌| 445/468 [00:25<00:01, 19.99it/s]
avg reward: 1.02:  95%|█████████▌| 445/468 [00:25<00:01, 19.99it/s]
avg reward: 1.02:  95%|█████████▌| 445/468 [00:25<00:01, 19.99it/s]
avg reward: 1.02:  96%|█████████▌| 448/468 [00:25<00:01, 17.31it/s]
avg reward: 1.02:  96%|█████████▌| 448/468 [00:25<00:01, 17.31it/s]
avg reward: 1.02:  96%|█████████▌| 448/468 [00:25<00:01, 17.31it/s]
avg reward: 1.02:  96%|█████████▌| 448/468 [00:25<00:01, 17.31it/s]
avg reward: 1.02:  96%|█████████▋| 451/468 [00:25<00:01, 15.89it/s]
avg reward: 1.02:  96%|█████████▋| 451/468 [00:25<00:01, 15.89it/s]
avg reward: 1.02:  96%|█████████▋| 451/468 [00:25<00:01, 15.89it/s]
avg reward: 1.02:  96%|█████████▋| 451/468 [00:25<00:01, 15.89it/s]
avg reward: 1.02:  96%|█████████▋| 451/468 [00:25<00:01, 15.89it/s]
avg reward: 1.02:  97%|█████████▋| 455/468 [00:25<00:00, 19.52it/s]
avg reward: 1.02:  97%|█████████▋| 455/468 [00:26<00:00, 19.52it/s]
avg reward: 1.02:  97%|█████████▋| 455/468 [00:26<00:00, 19.52it/s]
avg reward: 1.02:  97%|█████████▋| 455/468 [00:26<00:00, 19.52it/s]
avg reward: 1.02:  98%|█████████▊| 458/468 [00:26<00:00, 16.94it/s]
avg reward: 1.02:  98%|█████████▊| 458/468 [00:26<00:00, 16.94it/s]
avg reward: 1.02:  98%|█████████▊| 458/468 [00:26<00:00, 16.94it/s]
avg reward: 1.01:  98%|█████████▊| 458/468 [00:26<00:00, 16.94it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01:  99%|█████████▊| 461/468 [00:26<00:00, 15.84it/s]
avg reward: 1.01: 100%|█████████▉| 466/468 [00:26<00:00, 17.20it/s]
avg reward: 1.01: 100%|█████████▉| 466/468 [00:26<00:00, 17.20it/s]
avg reward: 1.01: 100%|██████████| 468/468 [00:26<00:00, 17.50it/s]

  0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9921875:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.984375:   0%|          | 0/79 [00:00<?, ?it/s] 
Val Acc: 0.984375:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.982421875:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9765625:   0%|          | 0/79 [00:00<?, ?it/s]  
Val Acc: 0.9765625:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9765625:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9765625:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.9775390625:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.9782986111111112:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.975:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]             
Val Acc: 0.9765625:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.974609375:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.9753605769230769:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.9754464285714286:   9%|▉         | 7/79 [00:00<00:01, 63.95it/s]
Val Acc: 0.9754464285714286:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]
Val Acc: 0.9765625:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]         
Val Acc: 0.97705078125:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]
Val Acc: 0.9765625:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]    
Val Acc: 0.9774305555555556:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]
Val Acc: 0.9782072368421053:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]
Val Acc: 0.97890625:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]        
Val Acc: 0.9787946428571429:  18%|█▊        | 14/79 [00:00<00:00, 65.41it/s]
Val Acc: 0.9787946428571429:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9794034090909091:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9792798913043478:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9794921875:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]      
Val Acc: 0.9790625:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]   
Val Acc: 0.9795673076923077:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9791666666666666:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9799107142857143:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9800646551724138:  27%|██▋       | 21/79 [00:00<00:00, 64.57it/s]
Val Acc: 0.9800646551724138:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9802083333333333:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9800907258064516:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.980224609375:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]    
Val Acc: 0.9808238636363636:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9806985294117647:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9808035714285714:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9809027777777778:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9809966216216216:  37%|███▋      | 29/79 [00:00<00:00, 66.77it/s]
Val Acc: 0.9809966216216216:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.98046875:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]        
Val Acc: 0.9805689102564102:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.980859375:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]       
Val Acc: 0.9807545731707317:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.9808407738095238:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.9805595930232558:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.9806463068181818:  47%|████▋     | 37/79 [00:00<00:00, 68.06it/s]
Val Acc: 0.9806463068181818:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9805555555555555:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9797894021739131:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9800531914893617:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9803059895833334:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9807079081632653:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9809375:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]         
Val Acc: 0.9811580882352942:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9807692307692307:  56%|█████▌    | 44/79 [00:00<00:00, 68.45it/s]
Val Acc: 0.9807692307692307:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9808372641509434:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9810474537037037:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.98125:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]           
Val Acc: 0.9810267857142857:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9812225877192983:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9811422413793104:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9814618644067796:  66%|██████▌   | 52/79 [00:00<00:00, 68.54it/s]
Val Acc: 0.9814618644067796:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.981640625:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]       
Val Acc: 0.9815573770491803:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.9816028225806451:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.9816468253968254:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.9815673828125:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]   
Val Acc: 0.9817307692307692:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.9817708333333334:  75%|███████▍  | 59/79 [00:00<00:00, 68.17it/s]
Val Acc: 0.9817708333333334:  84%|████████▎ | 66/79 [00:00<00:00, 67.50it/s]
Val Acc: 0.9818097014925373:  84%|████████▎ | 66/79 [00:00<00:00, 67.50it/s]
Val Acc: 0.9819623161764706:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s]
Val Acc: 0.9819972826086957:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s]
Val Acc: 0.98203125:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s]        
Val Acc: 0.9816241197183099:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s]
Val Acc: 0.9817708333333334:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s]
Val Acc: 0.981806506849315:  84%|████████▎ | 66/79 [00:01<00:00, 67.50it/s] 
Val Acc: 0.981806506849315:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]
Val Acc: 0.9817356418918919:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]
Val Acc: 0.9815625:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]         
Val Acc: 0.981702302631579:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]
Val Acc: 0.981939935064935:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]
Val Acc: 0.9821714743589743:  92%|█████████▏| 73/79 [00:01<00:00, 68.07it/s]
Val Acc: 0.9822: 100%|██████████| 79/79 [00:01<00:00, 68.12it/s]
epoch 0, val_acc: 0.98, avg reward: 1.01: 100%|██████████| 1/1 [00:27<00:00, 27.96s/it]

The resulting architecture is:

mynet.graph
../../_images/output_enas_proxylessnas_ad55b9_21_0.svg

Change the reward trade-off:

reward_fn = lambda metric, net: metric * ((net.avg_latency / net.latency) ** 0.8)
mynet.initialize(force_reinit=True)
scheduler = ENAS_Scheduler(mynet, train_set='mnist',
                           reward_fn=reward_fn, batch_size=128, num_gpus=1,
                           warmup_epochs=0, epochs=1, controller_lr=3e-3,
                           plot_frequency=10, update_arch_frequency=5)
scheduler.run()
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:656: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  self.collect_params().initialize(init, ctx, verbose, force_reinit)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/extra/src/autogluon/extra/contrib/enas/enas_scheduler.py:78: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  self.supernet.collect_params().reset_ctx(ctx)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/extra/src/autogluon/extra/contrib/enas/enas_utils.py:15: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  train_args['trainer'] = gluon.Trainer(net.collect_params(), 'sgd', optimizer_params)
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/468 [00:00<?, ?it/s]/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:926: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  params = self.collect_params()
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:682: UserWarning: Parameter batchnorm18_running_mean, batchnorm6_running_var, conv0_weight, batchnorm8_running_mean, conv10_weight, conv2_bias, batchnorm19_gamma, batchnorm12_beta, conv9_bias, batchnorm18_gamma, batchnorm13_gamma, batchnorm12_running_var, batchnorm9_running_var, batchnorm20_running_var, batchnorm6_gamma, conv3_bias, conv24_bias, batchnorm7_beta, conv10_bias, batchnorm12_running_mean, batchnorm18_beta, conv5_weight, conv14_weight, batchnorm21_beta, batchnorm16_beta, batchnorm2_beta, batchnorm0_running_mean, conv1_bias, batchnorm3_running_mean, batchnorm2_running_var, batchnorm3_beta, conv18_bias, conv24_weight, batchnorm13_beta, conv11_bias, batchnorm2_gamma, conv27_bias, batchnorm6_beta, batchnorm19_beta, conv28_weight, batchnorm9_running_mean, batchnorm12_gamma, conv4_weight, conv13_weight, batchnorm16_running_mean, batchnorm8_gamma, batchnorm8_beta, conv5_bias, batchnorm1_running_var, batchnorm7_running_var, batchnorm19_running_var, conv27_weight, conv1_weight, conv26_weight, batchnorm20_gamma, batchnorm16_gamma, batchnorm2_running_mean, conv19_weight, conv14_bias, conv0_bias, batchnorm0_beta, conv25_weight, batchnorm0_gamma, conv13_bias, batchnorm13_running_var, conv18_weight, batchnorm21_gamma, batchnorm6_running_mean, batchnorm9_gamma, batchnorm16_running_var, conv29_bias, batchnorm7_running_mean, batchnorm19_running_mean, conv26_bias, batchnorm9_beta, batchnorm8_running_var, conv2_weight, batchnorm1_gamma, conv19_bias, batchnorm7_gamma, conv25_bias, batchnorm1_beta, conv20_bias, batchnorm3_gamma, conv12_weight, conv4_bias, batchnorm13_running_mean, batchnorm21_running_var, conv3_weight, conv20_weight, batchnorm18_running_var, batchnorm17_running_var, conv11_weight, batchnorm21_running_mean, batchnorm17_gamma, batchnorm0_running_var, conv29_weight, conv12_bias, batchnorm1_running_mean, batchnorm17_beta, batchnorm3_running_var, conv9_weight, conv28_bias, batchnorm20_running_mean, batchnorm20_beta, batchnorm17_running_mean is not used by any computation. Is this intended?
  out = self.forward(*args)

avg reward: 0.09:   0%|          | 0/468 [00:00<?, ?it/s]
avg reward: 0.09:   0%|          | 1/468 [00:00<02:16,  3.41it/s]
avg reward: 0.09:   0%|          | 1/468 [00:00<02:16,  3.41it/s]
avg reward: 0.09:   0%|          | 1/468 [00:00<02:16,  3.41it/s]
avg reward: 0.09:   0%|          | 1/468 [00:00<02:16,  3.41it/s]
avg reward: 0.09:   0%|          | 1/468 [00:00<02:16,  3.41it/s]
avg reward: 0.09:   1%|          | 5/468 [00:00<00:30, 15.24it/s]
avg reward: 0.10:   1%|          | 5/468 [00:00<00:30, 15.24it/s]
avg reward: 0.10:   1%|          | 5/468 [00:00<00:30, 15.24it/s]
avg reward: 0.10:   1%|          | 5/468 [00:00<00:30, 15.24it/s]
avg reward: 0.10:   2%|▏         | 8/468 [00:00<00:34, 13.49it/s]
avg reward: 0.10:   2%|▏         | 8/468 [00:00<00:34, 13.49it/s]
avg reward: 0.10:   2%|▏         | 8/468 [00:00<00:34, 13.49it/s]
avg reward: 0.10:   2%|▏         | 8/468 [00:00<00:34, 13.49it/s]
avg reward: 0.10:   2%|▏         | 11/468 [00:00<00:35, 12.92it/s]
avg reward: 0.10:   2%|▏         | 11/468 [00:00<00:35, 12.92it/s]
avg reward: 0.10:   2%|▏         | 11/468 [00:00<00:35, 12.92it/s]
avg reward: 0.10:   2%|▏         | 11/468 [00:00<00:35, 12.92it/s]
avg reward: 0.10:   2%|▏         | 11/468 [00:00<00:35, 12.92it/s]
avg reward: 0.10:   3%|▎         | 15/468 [00:00<00:24, 18.19it/s]
avg reward: 0.10:   3%|▎         | 15/468 [00:01<00:24, 18.19it/s]
avg reward: 0.10:   3%|▎         | 15/468 [00:01<00:24, 18.19it/s]
avg reward: 0.10:   3%|▎         | 15/468 [00:01<00:24, 18.19it/s]
avg reward: 0.10:   4%|▍         | 18/468 [00:01<00:28, 15.55it/s]
avg reward: 0.10:   4%|▍         | 18/468 [00:01<00:28, 15.55it/s]
avg reward: 0.10:   4%|▍         | 18/468 [00:01<00:28, 15.55it/s]
avg reward: 0.11:   4%|▍         | 18/468 [00:01<00:28, 15.55it/s]
avg reward: 0.11:   4%|▍         | 21/468 [00:01<00:29, 14.91it/s]
avg reward: 0.11:   4%|▍         | 21/468 [00:01<00:29, 14.91it/s]
avg reward: 0.11:   4%|▍         | 21/468 [00:01<00:29, 14.91it/s]
avg reward: 0.11:   4%|▍         | 21/468 [00:01<00:29, 14.91it/s]
avg reward: 0.11:   4%|▍         | 21/468 [00:01<00:29, 14.91it/s]
avg reward: 0.11:   5%|▌         | 25/468 [00:01<00:22, 19.35it/s]
avg reward: 0.11:   5%|▌         | 25/468 [00:01<00:22, 19.35it/s]
avg reward: 0.11:   5%|▌         | 25/468 [00:01<00:22, 19.35it/s]
avg reward: 0.11:   5%|▌         | 25/468 [00:01<00:22, 19.35it/s]
avg reward: 0.11:   6%|▌         | 28/468 [00:01<00:26, 16.76it/s]
avg reward: 0.11:   6%|▌         | 28/468 [00:01<00:26, 16.76it/s]
avg reward: 0.11:   6%|▌         | 28/468 [00:01<00:26, 16.76it/s]
avg reward: 0.19:   6%|▌         | 28/468 [00:02<00:26, 16.76it/s]
avg reward: 0.19:   7%|▋         | 31/468 [00:02<00:28, 15.14it/s]
avg reward: 0.19:   7%|▋         | 31/468 [00:02<00:28, 15.14it/s]
avg reward: 0.19:   7%|▋         | 31/468 [00:02<00:28, 15.14it/s]
avg reward: 0.19:   7%|▋         | 31/468 [00:02<00:28, 15.14it/s]
avg reward: 0.19:   7%|▋         | 31/468 [00:02<00:28, 15.14it/s]
avg reward: 0.19:   7%|▋         | 35/468 [00:02<00:22, 19.24it/s]
avg reward: 0.26:   7%|▋         | 35/468 [00:02<00:22, 19.24it/s]
avg reward: 0.26:   7%|▋         | 35/468 [00:02<00:22, 19.24it/s]
avg reward: 0.26:   7%|▋         | 35/468 [00:02<00:22, 19.24it/s]
avg reward: 0.26:   8%|▊         | 38/468 [00:02<00:25, 16.93it/s]
avg reward: 0.26:   8%|▊         | 38/468 [00:02<00:25, 16.93it/s]
avg reward: 0.26:   8%|▊         | 38/468 [00:02<00:25, 16.93it/s]
avg reward: 0.37:   8%|▊         | 38/468 [00:02<00:25, 16.93it/s]
avg reward: 0.37:   9%|▉         | 41/468 [00:02<00:27, 15.79it/s]
avg reward: 0.37:   9%|▉         | 41/468 [00:02<00:27, 15.79it/s]
avg reward: 0.37:   9%|▉         | 41/468 [00:02<00:27, 15.79it/s]
avg reward: 0.37:   9%|▉         | 41/468 [00:02<00:27, 15.79it/s]
avg reward: 0.37:   9%|▉         | 41/468 [00:02<00:27, 15.79it/s]
avg reward: 0.37:  10%|▉         | 45/468 [00:02<00:21, 19.83it/s]
avg reward: 0.38:  10%|▉         | 45/468 [00:02<00:21, 19.83it/s]
avg reward: 0.38:  10%|▉         | 45/468 [00:02<00:21, 19.83it/s]
avg reward: 0.38:  10%|▉         | 45/468 [00:02<00:21, 19.83it/s]
avg reward: 0.38:  10%|█         | 48/468 [00:02<00:24, 17.27it/s]
avg reward: 0.38:  10%|█         | 48/468 [00:02<00:24, 17.27it/s]
avg reward: 0.38:  10%|█         | 48/468 [00:02<00:24, 17.27it/s]
avg reward: 0.51:  10%|█         | 48/468 [00:03<00:24, 17.27it/s]
avg reward: 0.51:  11%|█         | 51/468 [00:03<00:26, 15.93it/s]
avg reward: 0.51:  11%|█         | 51/468 [00:03<00:26, 15.93it/s]
avg reward: 0.51:  11%|█         | 51/468 [00:03<00:26, 15.93it/s]
avg reward: 0.51:  11%|█         | 51/468 [00:03<00:26, 15.93it/s]
avg reward: 0.51:  11%|█         | 51/468 [00:03<00:26, 15.93it/s]
avg reward: 0.51:  12%|█▏        | 55/468 [00:03<00:20, 20.11it/s]
avg reward: 0.62:  12%|█▏        | 55/468 [00:03<00:20, 20.11it/s]
avg reward: 0.62:  12%|█▏        | 55/468 [00:03<00:20, 20.11it/s]
avg reward: 0.62:  12%|█▏        | 55/468 [00:03<00:20, 20.11it/s]
avg reward: 0.62:  12%|█▏        | 58/468 [00:03<00:23, 17.72it/s]
avg reward: 0.62:  12%|█▏        | 58/468 [00:03<00:23, 17.72it/s]
avg reward: 0.62:  12%|█▏        | 58/468 [00:03<00:23, 17.72it/s]
avg reward: 0.53:  12%|█▏        | 58/468 [00:03<00:23, 17.72it/s]
avg reward: 0.53:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.53:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.53:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.53:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.53:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.63:  13%|█▎        | 61/468 [00:03<00:24, 16.52it/s]
avg reward: 0.63:  14%|█▍        | 66/468 [00:04<00:23, 17.04it/s]
avg reward: 0.63:  14%|█▍        | 66/468 [00:04<00:23, 17.04it/s]
avg reward: 0.63:  14%|█▍        | 66/468 [00:04<00:23, 17.04it/s]
avg reward: 0.63:  14%|█▍        | 66/468 [00:04<00:23, 17.04it/s]
avg reward: 0.63:  14%|█▍        | 66/468 [00:04<00:23, 17.04it/s]
avg reward: 0.63:  15%|█▍        | 70/468 [00:04<00:19, 20.66it/s]
avg reward: 0.62:  15%|█▍        | 70/468 [00:04<00:19, 20.66it/s]
avg reward: 0.62:  15%|█▍        | 70/468 [00:04<00:19, 20.66it/s]
avg reward: 0.62:  15%|█▍        | 70/468 [00:04<00:19, 20.66it/s]
avg reward: 0.62:  16%|█▌        | 73/468 [00:04<00:21, 18.06it/s]
avg reward: 0.62:  16%|█▌        | 73/468 [00:04<00:21, 18.06it/s]
avg reward: 0.62:  16%|█▌        | 73/468 [00:04<00:21, 18.06it/s]
avg reward: 0.71:  16%|█▌        | 73/468 [00:04<00:21, 18.06it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.71:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.80:  16%|█▌        | 76/468 [00:04<00:24, 16.32it/s]
avg reward: 0.80:  17%|█▋        | 81/468 [00:04<00:22, 17.13it/s]
avg reward: 0.80:  17%|█▋        | 81/468 [00:04<00:22, 17.13it/s]
avg reward: 0.80:  17%|█▋        | 81/468 [00:04<00:22, 17.13it/s]
avg reward: 0.80:  17%|█▋        | 81/468 [00:04<00:22, 17.13it/s]
avg reward: 0.80:  17%|█▋        | 81/468 [00:04<00:22, 17.13it/s]
avg reward: 0.80:  18%|█▊        | 85/468 [00:04<00:18, 20.48it/s]
avg reward: 0.79:  18%|█▊        | 85/468 [00:05<00:18, 20.48it/s]
avg reward: 0.79:  18%|█▊        | 85/468 [00:05<00:18, 20.48it/s]
avg reward: 0.79:  18%|█▊        | 85/468 [00:05<00:18, 20.48it/s]
avg reward: 0.79:  19%|█▉        | 88/468 [00:05<00:21, 17.56it/s]
avg reward: 0.79:  19%|█▉        | 88/468 [00:05<00:21, 17.56it/s]
avg reward: 0.79:  19%|█▉        | 88/468 [00:05<00:21, 17.56it/s]
avg reward: 0.81:  19%|█▉        | 88/468 [00:05<00:21, 17.56it/s]
avg reward: 0.81:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.81:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.81:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.81:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.81:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.80:  19%|█▉        | 91/468 [00:05<00:23, 16.00it/s]
avg reward: 0.80:  21%|██        | 96/468 [00:05<00:22, 16.87it/s]
avg reward: 0.80:  21%|██        | 96/468 [00:05<00:22, 16.87it/s]
avg reward: 0.80:  21%|██        | 96/468 [00:05<00:22, 16.87it/s]
avg reward: 0.80:  21%|██        | 96/468 [00:05<00:22, 16.87it/s]
avg reward: 0.80:  21%|██        | 96/468 [00:05<00:22, 16.87it/s]
avg reward: 0.80:  21%|██▏       | 100/468 [00:05<00:18, 19.89it/s]
avg reward: 0.86:  21%|██▏       | 100/468 [00:05<00:18, 19.89it/s]
avg reward: 0.86:  21%|██▏       | 100/468 [00:06<00:18, 19.89it/s]
avg reward: 0.86:  21%|██▏       | 100/468 [00:06<00:18, 19.89it/s]
avg reward: 0.86:  22%|██▏       | 103/468 [00:06<00:20, 17.90it/s]
avg reward: 0.86:  22%|██▏       | 103/468 [00:06<00:20, 17.90it/s]
avg reward: 0.86:  22%|██▏       | 103/468 [00:06<00:20, 17.90it/s]
avg reward: 0.89:  22%|██▏       | 103/468 [00:06<00:20, 17.90it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:21, 16.71it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:21, 16.71it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:21, 16.71it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:21, 16.71it/s]
avg reward: 0.89:  23%|██▎       | 106/468 [00:06<00:21, 16.71it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 20.15it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 20.15it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 20.15it/s]
avg reward: 0.89:  24%|██▎       | 110/468 [00:06<00:17, 20.15it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.24it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.24it/s]
avg reward: 0.89:  24%|██▍       | 113/468 [00:06<00:20, 17.24it/s]
avg reward: 0.87:  24%|██▍       | 113/468 [00:06<00:20, 17.24it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:06<00:22, 15.60it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:06<00:22, 15.60it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:06<00:22, 15.60it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:06<00:22, 15.60it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:06<00:22, 15.60it/s]
avg reward: 0.87:  25%|██▍       | 116/468 [00:07<00:22, 15.60it/s]
avg reward: 0.87:  26%|██▌       | 121/468 [00:07<00:21, 16.31it/s]
avg reward: 0.87:  26%|██▌       | 121/468 [00:07<00:21, 16.31it/s]
avg reward: 0.87:  26%|██▌       | 121/468 [00:07<00:21, 16.31it/s]
avg reward: 0.87:  26%|██▌       | 121/468 [00:07<00:21, 16.31it/s]
avg reward: 0.87:  26%|██▌       | 121/468 [00:07<00:21, 16.31it/s]
avg reward: 0.87:  27%|██▋       | 125/468 [00:07<00:17, 19.87it/s]
avg reward: 0.88:  27%|██▋       | 125/468 [00:07<00:17, 19.87it/s]
avg reward: 0.88:  27%|██▋       | 125/468 [00:07<00:17, 19.87it/s]
avg reward: 0.88:  27%|██▋       | 125/468 [00:07<00:17, 19.87it/s]
avg reward: 0.88:  27%|██▋       | 128/468 [00:07<00:19, 17.34it/s]
avg reward: 0.88:  27%|██▋       | 128/468 [00:07<00:19, 17.34it/s]
avg reward: 0.88:  27%|██▋       | 128/468 [00:07<00:19, 17.34it/s]
avg reward: 0.90:  27%|██▋       | 128/468 [00:07<00:19, 17.34it/s]
avg reward: 0.90:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.90:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.90:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.90:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.90:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.93:  28%|██▊       | 131/468 [00:07<00:21, 16.05it/s]
avg reward: 0.93:  29%|██▉       | 136/468 [00:07<00:19, 16.71it/s]
avg reward: 0.93:  29%|██▉       | 136/468 [00:08<00:19, 16.71it/s]
avg reward: 0.93:  29%|██▉       | 136/468 [00:08<00:19, 16.71it/s]
avg reward: 0.93:  29%|██▉       | 136/468 [00:08<00:19, 16.71it/s]
avg reward: 0.93:  29%|██▉       | 136/468 [00:08<00:19, 16.71it/s]
avg reward: 0.93:  30%|██▉       | 140/468 [00:08<00:16, 20.29it/s]
avg reward: 0.97:  30%|██▉       | 140/468 [00:08<00:16, 20.29it/s]
avg reward: 0.97:  30%|██▉       | 140/468 [00:08<00:16, 20.29it/s]
avg reward: 0.97:  30%|██▉       | 140/468 [00:08<00:16, 20.29it/s]
avg reward: 0.97:  31%|███       | 143/468 [00:08<00:18, 17.97it/s]
avg reward: 0.97:  31%|███       | 143/468 [00:08<00:18, 17.97it/s]
avg reward: 0.97:  31%|███       | 143/468 [00:08<00:18, 17.97it/s]
avg reward: 0.99:  31%|███       | 143/468 [00:08<00:18, 17.97it/s]
avg reward: 0.99:  31%|███       | 146/468 [00:08<00:19, 16.21it/s]
avg reward: 0.99:  31%|███       | 146/468 [00:08<00:19, 16.21it/s]
avg reward: 0.99:  31%|███       | 146/468 [00:08<00:19, 16.21it/s]
avg reward: 0.99:  31%|███       | 146/468 [00:08<00:19, 16.21it/s]
avg reward: 0.99:  31%|███       | 146/468 [00:08<00:19, 16.21it/s]
avg reward: 0.99:  32%|███▏      | 150/468 [00:08<00:15, 19.92it/s]
avg reward: 0.97:  32%|███▏      | 150/468 [00:08<00:15, 19.92it/s]
avg reward: 0.97:  32%|███▏      | 150/468 [00:08<00:15, 19.92it/s]
avg reward: 0.97:  32%|███▏      | 150/468 [00:08<00:15, 19.92it/s]
avg reward: 0.97:  33%|███▎      | 153/468 [00:08<00:18, 17.37it/s]
avg reward: 0.97:  33%|███▎      | 153/468 [00:08<00:18, 17.37it/s]
avg reward: 0.97:  33%|███▎      | 153/468 [00:08<00:18, 17.37it/s]
avg reward: 0.95:  33%|███▎      | 153/468 [00:09<00:18, 17.37it/s]
avg reward: 0.95:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.95:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.95:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.95:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.95:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.96:  33%|███▎      | 156/468 [00:09<00:19, 16.12it/s]
avg reward: 0.96:  34%|███▍      | 161/468 [00:09<00:18, 16.95it/s]
avg reward: 0.96:  34%|███▍      | 161/468 [00:09<00:18, 16.95it/s]
avg reward: 0.96:  34%|███▍      | 161/468 [00:09<00:18, 16.95it/s]
avg reward: 0.96:  34%|███▍      | 161/468 [00:09<00:18, 16.95it/s]
avg reward: 0.96:  34%|███▍      | 161/468 [00:09<00:18, 16.95it/s]
avg reward: 0.96:  35%|███▌      | 165/468 [00:09<00:15, 20.04it/s]
avg reward: 0.97:  35%|███▌      | 165/468 [00:09<00:15, 20.04it/s]
avg reward: 0.97:  35%|███▌      | 165/468 [00:09<00:15, 20.04it/s]
avg reward: 0.97:  35%|███▌      | 165/468 [00:09<00:15, 20.04it/s]
avg reward: 0.97:  36%|███▌      | 168/468 [00:09<00:17, 17.64it/s]
avg reward: 0.97:  36%|███▌      | 168/468 [00:09<00:17, 17.64it/s]
avg reward: 0.97:  36%|███▌      | 168/468 [00:09<00:17, 17.64it/s]
avg reward: 1.00:  36%|███▌      | 168/468 [00:09<00:17, 17.64it/s]
avg reward: 1.00:  37%|███▋      | 171/468 [00:09<00:18, 16.24it/s]
avg reward: 1.00:  37%|███▋      | 171/468 [00:09<00:18, 16.24it/s]
avg reward: 1.00:  37%|███▋      | 171/468 [00:10<00:18, 16.24it/s]
avg reward: 1.00:  37%|███▋      | 171/468 [00:10<00:18, 16.24it/s]
avg reward: 1.00:  37%|███▋      | 171/468 [00:10<00:18, 16.24it/s]
avg reward: 1.03:  37%|███▋      | 171/468 [00:10<00:18, 16.24it/s]
avg reward: 1.03:  38%|███▊      | 176/468 [00:10<00:17, 16.63it/s]
avg reward: 1.03:  38%|███▊      | 176/468 [00:10<00:17, 16.63it/s]
avg reward: 1.03:  38%|███▊      | 176/468 [00:10<00:17, 16.63it/s]
avg reward: 1.03:  38%|███▊      | 176/468 [00:10<00:17, 16.63it/s]
avg reward: 1.03:  38%|███▊      | 176/468 [00:10<00:17, 16.63it/s]
avg reward: 1.03:  38%|███▊      | 180/468 [00:10<00:14, 19.99it/s]
avg reward: 1.03:  38%|███▊      | 180/468 [00:10<00:14, 19.99it/s]
avg reward: 1.03:  38%|███▊      | 180/468 [00:10<00:14, 19.99it/s]
avg reward: 1.03:  38%|███▊      | 180/468 [00:10<00:14, 19.99it/s]
avg reward: 1.03:  39%|███▉      | 183/468 [00:10<00:15, 18.01it/s]
avg reward: 1.03:  39%|███▉      | 183/468 [00:10<00:15, 18.01it/s]
avg reward: 1.03:  39%|███▉      | 183/468 [00:10<00:15, 18.01it/s]
avg reward: 1.03:  39%|███▉      | 183/468 [00:10<00:15, 18.01it/s]
avg reward: 1.03:  40%|███▉      | 186/468 [00:10<00:17, 16.32it/s]
avg reward: 1.03:  40%|███▉      | 186/468 [00:10<00:17, 16.32it/s]
avg reward: 1.03:  40%|███▉      | 186/468 [00:10<00:17, 16.32it/s]
avg reward: 1.03:  40%|███▉      | 186/468 [00:10<00:17, 16.32it/s]
avg reward: 1.03:  40%|███▉      | 186/468 [00:10<00:17, 16.32it/s]
avg reward: 1.01:  40%|███▉      | 186/468 [00:11<00:17, 16.32it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  41%|████      | 191/468 [00:11<00:15, 17.33it/s]
avg reward: 1.01:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 1.01:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 1.01:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 1.01:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 1.01:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 0.99:  42%|████▏     | 196/468 [00:11<00:15, 17.34it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  43%|████▎     | 201/468 [00:11<00:15, 17.72it/s]
avg reward: 0.99:  44%|████▍     | 206/468 [00:11<00:14, 17.88it/s]
avg reward: 0.99:  44%|████▍     | 206/468 [00:11<00:14, 17.88it/s]
avg reward: 0.99:  44%|████▍     | 206/468 [00:11<00:14, 17.88it/s]
avg reward: 0.99:  44%|████▍     | 206/468 [00:11<00:14, 17.88it/s]
avg reward: 0.99:  44%|████▍     | 206/468 [00:11<00:14, 17.88it/s]
avg reward: 1.02:  44%|████▍     | 206/468 [00:12<00:14, 17.88it/s]
avg reward: 1.02:  45%|████▌     | 211/468 [00:12<00:14, 18.11it/s]
avg reward: 1.02:  45%|████▌     | 211/468 [00:12<00:14, 18.11it/s]
avg reward: 1.02:  45%|████▌     | 211/468 [00:12<00:14, 18.11it/s]
avg reward: 1.02:  45%|████▌     | 211/468 [00:12<00:14, 18.11it/s]
avg reward: 1.02:  45%|████▌     | 211/468 [00:12<00:14, 18.11it/s]
avg reward: 1.02:  46%|████▌     | 215/468 [00:12<00:12, 21.01it/s]
avg reward: 1.05:  46%|████▌     | 215/468 [00:12<00:12, 21.01it/s]
avg reward: 1.05:  46%|████▌     | 215/468 [00:12<00:12, 21.01it/s]
avg reward: 1.05:  46%|████▌     | 215/468 [00:12<00:12, 21.01it/s]
avg reward: 1.05:  47%|████▋     | 218/468 [00:12<00:13, 18.05it/s]
avg reward: 1.05:  47%|████▋     | 218/468 [00:12<00:13, 18.05it/s]
avg reward: 1.05:  47%|████▋     | 218/468 [00:12<00:13, 18.05it/s]
avg reward: 1.08:  47%|████▋     | 218/468 [00:12<00:13, 18.05it/s]
avg reward: 1.08:  47%|████▋     | 221/468 [00:12<00:14, 16.69it/s]
avg reward: 1.08:  47%|████▋     | 221/468 [00:12<00:14, 16.69it/s]
avg reward: 1.08:  47%|████▋     | 221/468 [00:12<00:14, 16.69it/s]
avg reward: 1.08:  47%|████▋     | 221/468 [00:12<00:14, 16.69it/s]
avg reward: 1.08:  47%|████▋     | 221/468 [00:12<00:14, 16.69it/s]
avg reward: 1.08:  48%|████▊     | 225/468 [00:12<00:12, 20.14it/s]
avg reward: 1.12:  48%|████▊     | 225/468 [00:13<00:12, 20.14it/s]
avg reward: 1.12:  48%|████▊     | 225/468 [00:13<00:12, 20.14it/s]
avg reward: 1.12:  48%|████▊     | 225/468 [00:13<00:12, 20.14it/s]
avg reward: 1.12:  49%|████▊     | 228/468 [00:13<00:13, 17.24it/s]
avg reward: 1.12:  49%|████▊     | 228/468 [00:13<00:13, 17.24it/s]
avg reward: 1.12:  49%|████▊     | 228/468 [00:13<00:13, 17.24it/s]
avg reward: 1.10:  49%|████▊     | 228/468 [00:13<00:13, 17.24it/s]
avg reward: 1.10:  49%|████▉     | 231/468 [00:13<00:14, 16.35it/s]
avg reward: 1.10:  49%|████▉     | 231/468 [00:13<00:14, 16.35it/s]
avg reward: 1.10:  49%|████▉     | 231/468 [00:13<00:14, 16.35it/s]
avg reward: 1.10:  49%|████▉     | 231/468 [00:13<00:14, 16.35it/s]
avg reward: 1.10:  49%|████▉     | 231/468 [00:13<00:14, 16.35it/s]
avg reward: 1.10:  50%|█████     | 235/468 [00:13<00:11, 20.37it/s]
avg reward: 1.10:  50%|█████     | 235/468 [00:13<00:11, 20.37it/s]
avg reward: 1.10:  50%|█████     | 235/468 [00:13<00:11, 20.37it/s]
avg reward: 1.10:  50%|█████     | 235/468 [00:13<00:11, 20.37it/s]
avg reward: 1.10:  51%|█████     | 238/468 [00:13<00:13, 17.53it/s]
avg reward: 1.10:  51%|█████     | 238/468 [00:13<00:13, 17.53it/s]
avg reward: 1.10:  51%|█████     | 238/468 [00:13<00:13, 17.53it/s]
avg reward: 1.13:  51%|█████     | 238/468 [00:13<00:13, 17.53it/s]
avg reward: 1.13:  51%|█████▏    | 241/468 [00:13<00:14, 15.78it/s]
avg reward: 1.13:  51%|█████▏    | 241/468 [00:13<00:14, 15.78it/s]
avg reward: 1.13:  51%|█████▏    | 241/468 [00:13<00:14, 15.78it/s]
avg reward: 1.13:  51%|█████▏    | 241/468 [00:13<00:14, 15.78it/s]
avg reward: 1.13:  51%|█████▏    | 241/468 [00:13<00:14, 15.78it/s]
avg reward: 1.15:  51%|█████▏    | 241/468 [00:14<00:14, 15.78it/s]
avg reward: 1.15:  53%|█████▎    | 246/468 [00:14<00:13, 16.15it/s]
avg reward: 1.15:  53%|█████▎    | 246/468 [00:14<00:13, 16.15it/s]
avg reward: 1.15:  53%|█████▎    | 246/468 [00:14<00:13, 16.15it/s]
avg reward: 1.15:  53%|█████▎    | 246/468 [00:14<00:13, 16.15it/s]
avg reward: 1.15:  53%|█████▎    | 246/468 [00:14<00:13, 16.15it/s]
avg reward: 1.15:  53%|█████▎    | 250/468 [00:14<00:11, 19.57it/s]
avg reward: 1.16:  53%|█████▎    | 250/468 [00:14<00:11, 19.57it/s]
avg reward: 1.16:  53%|█████▎    | 250/468 [00:14<00:11, 19.57it/s]
avg reward: 1.16:  53%|█████▎    | 250/468 [00:14<00:11, 19.57it/s]
avg reward: 1.16:  54%|█████▍    | 253/468 [00:14<00:12, 16.78it/s]
avg reward: 1.16:  54%|█████▍    | 253/468 [00:14<00:12, 16.78it/s]
avg reward: 1.16:  54%|█████▍    | 253/468 [00:14<00:12, 16.78it/s]
avg reward: 1.18:  54%|█████▍    | 253/468 [00:14<00:12, 16.78it/s]
avg reward: 1.18:  55%|█████▍    | 256/468 [00:14<00:13, 15.77it/s]
avg reward: 1.18:  55%|█████▍    | 256/468 [00:14<00:13, 15.77it/s]
avg reward: 1.18:  55%|█████▍    | 256/468 [00:14<00:13, 15.77it/s]
avg reward: 1.18:  55%|█████▍    | 256/468 [00:14<00:13, 15.77it/s]
avg reward: 1.18:  55%|█████▍    | 256/468 [00:14<00:13, 15.77it/s]
avg reward: 1.18:  56%|█████▌    | 260/468 [00:14<00:10, 19.58it/s]
avg reward: 1.19:  56%|█████▌    | 260/468 [00:15<00:10, 19.58it/s]
avg reward: 1.19:  56%|█████▌    | 260/468 [00:15<00:10, 19.58it/s]
avg reward: 1.19:  56%|█████▌    | 260/468 [00:15<00:10, 19.58it/s]
avg reward: 1.19:  56%|█████▌    | 263/468 [00:15<00:11, 17.32it/s]
avg reward: 1.19:  56%|█████▌    | 263/468 [00:15<00:11, 17.32it/s]
avg reward: 1.19:  56%|█████▌    | 263/468 [00:15<00:11, 17.32it/s]
avg reward: 1.20:  56%|█████▌    | 263/468 [00:15<00:11, 17.32it/s]
avg reward: 1.20:  57%|█████▋    | 266/468 [00:15<00:12, 15.88it/s]
avg reward: 1.20:  57%|█████▋    | 266/468 [00:15<00:12, 15.88it/s]
avg reward: 1.20:  57%|█████▋    | 266/468 [00:15<00:12, 15.88it/s]
avg reward: 1.20:  57%|█████▋    | 266/468 [00:15<00:12, 15.88it/s]
avg reward: 1.20:  57%|█████▋    | 266/468 [00:15<00:12, 15.88it/s]
avg reward: 1.20:  58%|█████▊    | 270/468 [00:15<00:09, 19.99it/s]
avg reward: 1.21:  58%|█████▊    | 270/468 [00:15<00:09, 19.99it/s]
avg reward: 1.21:  58%|█████▊    | 270/468 [00:15<00:09, 19.99it/s]
avg reward: 1.21:  58%|█████▊    | 270/468 [00:15<00:09, 19.99it/s]
avg reward: 1.21:  58%|█████▊    | 273/468 [00:15<00:11, 17.30it/s]
avg reward: 1.21:  58%|█████▊    | 273/468 [00:15<00:11, 17.30it/s]
avg reward: 1.21:  58%|█████▊    | 273/468 [00:15<00:11, 17.30it/s]
avg reward: 1.22:  58%|█████▊    | 273/468 [00:15<00:11, 17.30it/s]
avg reward: 1.22:  59%|█████▉    | 276/468 [00:15<00:12, 15.95it/s]
avg reward: 1.22:  59%|█████▉    | 276/468 [00:15<00:12, 15.95it/s]
avg reward: 1.22:  59%|█████▉    | 276/468 [00:15<00:12, 15.95it/s]
avg reward: 1.22:  59%|█████▉    | 276/468 [00:15<00:12, 15.95it/s]
avg reward: 1.22:  59%|█████▉    | 276/468 [00:16<00:12, 15.95it/s]
avg reward: 1.22:  60%|█████▉    | 280/468 [00:16<00:09, 19.97it/s]
avg reward: 1.24:  60%|█████▉    | 280/468 [00:16<00:09, 19.97it/s]
avg reward: 1.24:  60%|█████▉    | 280/468 [00:16<00:09, 19.97it/s]
avg reward: 1.24:  60%|█████▉    | 280/468 [00:16<00:09, 19.97it/s]
avg reward: 1.24:  60%|██████    | 283/468 [00:16<00:10, 17.14it/s]
avg reward: 1.24:  60%|██████    | 283/468 [00:16<00:10, 17.14it/s]
avg reward: 1.24:  60%|██████    | 283/468 [00:16<00:10, 17.14it/s]
avg reward: 1.25:  60%|██████    | 283/468 [00:16<00:10, 17.14it/s]
avg reward: 1.25:  61%|██████    | 286/468 [00:16<00:11, 15.90it/s]
avg reward: 1.25:  61%|██████    | 286/468 [00:16<00:11, 15.90it/s]
avg reward: 1.25:  61%|██████    | 286/468 [00:16<00:11, 15.90it/s]
avg reward: 1.25:  61%|██████    | 286/468 [00:16<00:11, 15.90it/s]
avg reward: 1.25:  61%|██████    | 286/468 [00:16<00:11, 15.90it/s]
avg reward: 1.25:  62%|██████▏   | 290/468 [00:16<00:08, 20.00it/s]
avg reward: 1.25:  62%|██████▏   | 290/468 [00:16<00:08, 20.00it/s]
avg reward: 1.25:  62%|██████▏   | 290/468 [00:16<00:08, 20.00it/s]
avg reward: 1.25:  62%|██████▏   | 290/468 [00:16<00:08, 20.00it/s]
avg reward: 1.25:  63%|██████▎   | 293/468 [00:16<00:10, 17.43it/s]
avg reward: 1.25:  63%|██████▎   | 293/468 [00:16<00:10, 17.43it/s]
avg reward: 1.25:  63%|██████▎   | 293/468 [00:16<00:10, 17.43it/s]
avg reward: 1.25:  63%|██████▎   | 293/468 [00:17<00:10, 17.43it/s]
avg reward: 1.25:  63%|██████▎   | 296/468 [00:17<00:10, 16.02it/s]
avg reward: 1.25:  63%|██████▎   | 296/468 [00:17<00:10, 16.02it/s]
avg reward: 1.25:  63%|██████▎   | 296/468 [00:17<00:10, 16.02it/s]
avg reward: 1.25:  63%|██████▎   | 296/468 [00:17<00:10, 16.02it/s]
avg reward: 1.25:  63%|██████▎   | 296/468 [00:17<00:10, 16.02it/s]
avg reward: 1.25:  64%|██████▍   | 300/468 [00:17<00:08, 20.17it/s]
avg reward: 1.25:  64%|██████▍   | 300/468 [00:17<00:08, 20.17it/s]
avg reward: 1.25:  64%|██████▍   | 300/468 [00:17<00:08, 20.17it/s]
avg reward: 1.25:  64%|██████▍   | 300/468 [00:17<00:08, 20.17it/s]
avg reward: 1.25:  65%|██████▍   | 303/468 [00:17<00:09, 17.83it/s]
avg reward: 1.25:  65%|██████▍   | 303/468 [00:17<00:09, 17.83it/s]
avg reward: 1.25:  65%|██████▍   | 303/468 [00:17<00:09, 17.83it/s]
avg reward: 1.25:  65%|██████▍   | 303/468 [00:17<00:09, 17.83it/s]
avg reward: 1.25:  65%|██████▌   | 306/468 [00:17<00:09, 16.21it/s]
avg reward: 1.25:  65%|██████▌   | 306/468 [00:17<00:09, 16.21it/s]
avg reward: 1.25:  65%|██████▌   | 306/468 [00:17<00:09, 16.21it/s]
avg reward: 1.25:  65%|██████▌   | 306/468 [00:17<00:09, 16.21it/s]
avg reward: 1.25:  65%|██████▌   | 306/468 [00:17<00:09, 16.21it/s]
avg reward: 1.25:  66%|██████▌   | 310/468 [00:17<00:07, 20.35it/s]
avg reward: 1.26:  66%|██████▌   | 310/468 [00:17<00:07, 20.35it/s]
avg reward: 1.26:  66%|██████▌   | 310/468 [00:17<00:07, 20.35it/s]
avg reward: 1.26:  66%|██████▌   | 310/468 [00:17<00:07, 20.35it/s]
avg reward: 1.26:  67%|██████▋   | 313/468 [00:17<00:08, 17.28it/s]
avg reward: 1.26:  67%|██████▋   | 313/468 [00:17<00:08, 17.28it/s]
avg reward: 1.26:  67%|██████▋   | 313/468 [00:17<00:08, 17.28it/s]
avg reward: 1.26:  67%|██████▋   | 313/468 [00:18<00:08, 17.28it/s]
avg reward: 1.26:  68%|██████▊   | 316/468 [00:18<00:09, 16.01it/s]
avg reward: 1.26:  68%|██████▊   | 316/468 [00:18<00:09, 16.01it/s]
avg reward: 1.26:  68%|██████▊   | 316/468 [00:18<00:09, 16.01it/s]
avg reward: 1.26:  68%|██████▊   | 316/468 [00:18<00:09, 16.01it/s]
avg reward: 1.26:  68%|██████▊   | 316/468 [00:18<00:09, 16.01it/s]
avg reward: 1.26:  68%|██████▊   | 320/468 [00:18<00:07, 20.04it/s]
avg reward: 1.26:  68%|██████▊   | 320/468 [00:18<00:07, 20.04it/s]
avg reward: 1.26:  68%|██████▊   | 320/468 [00:18<00:07, 20.04it/s]
avg reward: 1.26:  68%|██████▊   | 320/468 [00:18<00:07, 20.04it/s]
avg reward: 1.26:  69%|██████▉   | 323/468 [00:18<00:08, 17.25it/s]
avg reward: 1.26:  69%|██████▉   | 323/468 [00:18<00:08, 17.25it/s]
avg reward: 1.26:  69%|██████▉   | 323/468 [00:18<00:08, 17.25it/s]
avg reward: 1.27:  69%|██████▉   | 323/468 [00:18<00:08, 17.25it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  70%|██████▉   | 326/468 [00:18<00:08, 16.18it/s]
avg reward: 1.27:  71%|███████   | 331/468 [00:18<00:07, 17.38it/s]
avg reward: 1.27:  71%|███████   | 331/468 [00:18<00:07, 17.38it/s]
avg reward: 1.27:  71%|███████   | 331/468 [00:19<00:07, 17.38it/s]
avg reward: 1.27:  71%|███████   | 331/468 [00:19<00:07, 17.38it/s]
avg reward: 1.27:  71%|███████   | 331/468 [00:19<00:07, 17.38it/s]
avg reward: 1.27:  72%|███████▏  | 335/468 [00:19<00:06, 21.01it/s]
avg reward: 1.27:  72%|███████▏  | 335/468 [00:19<00:06, 21.01it/s]
avg reward: 1.27:  72%|███████▏  | 335/468 [00:19<00:06, 21.01it/s]
avg reward: 1.27:  72%|███████▏  | 335/468 [00:19<00:06, 21.01it/s]
avg reward: 1.27:  72%|███████▏  | 338/468 [00:19<00:07, 17.80it/s]
avg reward: 1.27:  72%|███████▏  | 338/468 [00:19<00:07, 17.80it/s]
avg reward: 1.27:  72%|███████▏  | 338/468 [00:19<00:07, 17.80it/s]
avg reward: 1.27:  72%|███████▏  | 338/468 [00:19<00:07, 17.80it/s]
avg reward: 1.27:  73%|███████▎  | 341/468 [00:19<00:07, 16.49it/s]
avg reward: 1.27:  73%|███████▎  | 341/468 [00:19<00:07, 16.49it/s]
avg reward: 1.27:  73%|███████▎  | 341/468 [00:19<00:07, 16.49it/s]
avg reward: 1.27:  73%|███████▎  | 341/468 [00:19<00:07, 16.49it/s]
avg reward: 1.27:  73%|███████▎  | 341/468 [00:19<00:07, 16.49it/s]
avg reward: 1.27:  74%|███████▎  | 345/468 [00:19<00:05, 20.52it/s]
avg reward: 1.27:  74%|███████▎  | 345/468 [00:19<00:05, 20.52it/s]
avg reward: 1.27:  74%|███████▎  | 345/468 [00:19<00:05, 20.52it/s]
avg reward: 1.27:  74%|███████▎  | 345/468 [00:19<00:05, 20.52it/s]
avg reward: 1.27:  74%|███████▍  | 348/468 [00:19<00:06, 17.46it/s]
avg reward: 1.27:  74%|███████▍  | 348/468 [00:19<00:06, 17.46it/s]
avg reward: 1.27:  74%|███████▍  | 348/468 [00:19<00:06, 17.46it/s]
avg reward: 1.27:  74%|███████▍  | 348/468 [00:20<00:06, 17.46it/s]
avg reward: 1.27:  75%|███████▌  | 351/468 [00:20<00:07, 16.35it/s]
avg reward: 1.27:  75%|███████▌  | 351/468 [00:20<00:07, 16.35it/s]
avg reward: 1.27:  75%|███████▌  | 351/468 [00:20<00:07, 16.35it/s]
avg reward: 1.27:  75%|███████▌  | 351/468 [00:20<00:07, 16.35it/s]
avg reward: 1.27:  75%|███████▌  | 351/468 [00:20<00:07, 16.35it/s]
avg reward: 1.27:  76%|███████▌  | 355/468 [00:20<00:05, 20.49it/s]
avg reward: 1.27:  76%|███████▌  | 355/468 [00:20<00:05, 20.49it/s]
avg reward: 1.27:  76%|███████▌  | 355/468 [00:20<00:05, 20.49it/s]
avg reward: 1.27:  76%|███████▌  | 355/468 [00:20<00:05, 20.49it/s]
avg reward: 1.27:  76%|███████▋  | 358/468 [00:20<00:06, 17.07it/s]
avg reward: 1.27:  76%|███████▋  | 358/468 [00:20<00:06, 17.07it/s]
avg reward: 1.27:  76%|███████▋  | 358/468 [00:20<00:06, 17.07it/s]
avg reward: 1.27:  76%|███████▋  | 358/468 [00:20<00:06, 17.07it/s]
avg reward: 1.27:  77%|███████▋  | 361/468 [00:20<00:06, 15.76it/s]
avg reward: 1.27:  77%|███████▋  | 361/468 [00:20<00:06, 15.76it/s]
avg reward: 1.27:  77%|███████▋  | 361/468 [00:20<00:06, 15.76it/s]
avg reward: 1.27:  77%|███████▋  | 361/468 [00:20<00:06, 15.76it/s]
avg reward: 1.27:  77%|███████▋  | 361/468 [00:20<00:06, 15.76it/s]
avg reward: 1.27:  78%|███████▊  | 365/468 [00:20<00:05, 19.80it/s]
avg reward: 1.28:  78%|███████▊  | 365/468 [00:20<00:05, 19.80it/s]
avg reward: 1.28:  78%|███████▊  | 365/468 [00:20<00:05, 19.80it/s]
avg reward: 1.28:  78%|███████▊  | 365/468 [00:21<00:05, 19.80it/s]
avg reward: 1.28:  79%|███████▊  | 368/468 [00:21<00:05, 17.28it/s]
avg reward: 1.28:  79%|███████▊  | 368/468 [00:21<00:05, 17.28it/s]
avg reward: 1.28:  79%|███████▊  | 368/468 [00:21<00:05, 17.28it/s]
avg reward: 1.28:  79%|███████▊  | 368/468 [00:21<00:05, 17.28it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  79%|███████▉  | 371/468 [00:21<00:06, 16.07it/s]
avg reward: 1.28:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.28:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.28:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.28:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.28:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.27:  80%|████████  | 376/468 [00:21<00:05, 17.33it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:21<00:04, 17.84it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:21<00:04, 17.84it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:21<00:04, 17.84it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:21<00:04, 17.84it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:21<00:04, 17.84it/s]
avg reward: 1.27:  81%|████████▏ | 381/468 [00:22<00:04, 17.84it/s]
avg reward: 1.27:  82%|████████▏ | 386/468 [00:22<00:04, 17.89it/s]
avg reward: 1.27:  82%|████████▏ | 386/468 [00:22<00:04, 17.89it/s]
avg reward: 1.27:  82%|████████▏ | 386/468 [00:22<00:04, 17.89it/s]
avg reward: 1.27:  82%|████████▏ | 386/468 [00:22<00:04, 17.89it/s]
avg reward: 1.27:  82%|████████▏ | 386/468 [00:22<00:04, 17.89it/s]
avg reward: 1.27:  83%|████████▎ | 390/468 [00:22<00:03, 21.18it/s]
avg reward: 1.27:  83%|████████▎ | 390/468 [00:22<00:03, 21.18it/s]
avg reward: 1.27:  83%|████████▎ | 390/468 [00:22<00:03, 21.18it/s]
avg reward: 1.27:  83%|████████▎ | 390/468 [00:22<00:03, 21.18it/s]
avg reward: 1.27:  84%|████████▍ | 393/468 [00:22<00:04, 18.22it/s]
avg reward: 1.27:  84%|████████▍ | 393/468 [00:22<00:04, 18.22it/s]
avg reward: 1.27:  84%|████████▍ | 393/468 [00:22<00:04, 18.22it/s]
avg reward: 1.27:  84%|████████▍ | 393/468 [00:22<00:04, 18.22it/s]
avg reward: 1.27:  85%|████████▍ | 396/468 [00:22<00:04, 16.39it/s]
avg reward: 1.27:  85%|████████▍ | 396/468 [00:22<00:04, 16.39it/s]
avg reward: 1.27:  85%|████████▍ | 396/468 [00:22<00:04, 16.39it/s]
avg reward: 1.27:  85%|████████▍ | 396/468 [00:22<00:04, 16.39it/s]
avg reward: 1.27:  85%|████████▍ | 396/468 [00:22<00:04, 16.39it/s]
avg reward: 1.27:  85%|████████▌ | 400/468 [00:22<00:03, 20.16it/s]
avg reward: 1.27:  85%|████████▌ | 400/468 [00:22<00:03, 20.16it/s]
avg reward: 1.27:  85%|████████▌ | 400/468 [00:22<00:03, 20.16it/s]
avg reward: 1.27:  85%|████████▌ | 400/468 [00:22<00:03, 20.16it/s]
avg reward: 1.27:  86%|████████▌ | 403/468 [00:22<00:03, 17.70it/s]
avg reward: 1.27:  86%|████████▌ | 403/468 [00:22<00:03, 17.70it/s]
avg reward: 1.27:  86%|████████▌ | 403/468 [00:22<00:03, 17.70it/s]
avg reward: 1.28:  86%|████████▌ | 403/468 [00:23<00:03, 17.70it/s]
avg reward: 1.28:  87%|████████▋ | 406/468 [00:23<00:03, 16.21it/s]
avg reward: 1.28:  87%|████████▋ | 406/468 [00:23<00:03, 16.21it/s]
avg reward: 1.28:  87%|████████▋ | 406/468 [00:23<00:03, 16.21it/s]
avg reward: 1.28:  87%|████████▋ | 406/468 [00:23<00:03, 16.21it/s]
avg reward: 1.28:  87%|████████▋ | 406/468 [00:23<00:03, 16.21it/s]
avg reward: 1.28:  88%|████████▊ | 410/468 [00:23<00:02, 20.16it/s]
avg reward: 1.28:  88%|████████▊ | 410/468 [00:23<00:02, 20.16it/s]
avg reward: 1.28:  88%|████████▊ | 410/468 [00:23<00:02, 20.16it/s]
avg reward: 1.28:  88%|████████▊ | 410/468 [00:23<00:02, 20.16it/s]
avg reward: 1.28:  88%|████████▊ | 413/468 [00:23<00:03, 16.98it/s]
avg reward: 1.28:  88%|████████▊ | 413/468 [00:23<00:03, 16.98it/s]
avg reward: 1.28:  88%|████████▊ | 413/468 [00:23<00:03, 16.98it/s]
avg reward: 1.27:  88%|████████▊ | 413/468 [00:23<00:03, 16.98it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:23<00:03, 15.45it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:23<00:03, 15.45it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:23<00:03, 15.45it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:23<00:03, 15.45it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:23<00:03, 15.45it/s]
avg reward: 1.27:  89%|████████▉ | 416/468 [00:24<00:03, 15.45it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  90%|████████▉ | 421/468 [00:24<00:02, 16.47it/s]
avg reward: 1.27:  91%|█████████ | 426/468 [00:24<00:02, 17.24it/s]
avg reward: 1.27:  91%|█████████ | 426/468 [00:24<00:02, 17.24it/s]
avg reward: 1.27:  91%|█████████ | 426/468 [00:24<00:02, 17.24it/s]
avg reward: 1.27:  91%|█████████ | 426/468 [00:24<00:02, 17.24it/s]
avg reward: 1.27:  91%|█████████ | 426/468 [00:24<00:02, 17.24it/s]
avg reward: 1.27:  92%|█████████▏| 430/468 [00:24<00:01, 20.18it/s]
avg reward: 1.28:  92%|█████████▏| 430/468 [00:24<00:01, 20.18it/s]
avg reward: 1.28:  92%|█████████▏| 430/468 [00:24<00:01, 20.18it/s]
avg reward: 1.28:  92%|█████████▏| 430/468 [00:24<00:01, 20.18it/s]
avg reward: 1.28:  93%|█████████▎| 433/468 [00:24<00:01, 17.54it/s]
avg reward: 1.28:  93%|█████████▎| 433/468 [00:24<00:01, 17.54it/s]
avg reward: 1.28:  93%|█████████▎| 433/468 [00:24<00:01, 17.54it/s]
avg reward: 1.27:  93%|█████████▎| 433/468 [00:24<00:01, 17.54it/s]
avg reward: 1.27:  93%|█████████▎| 436/468 [00:24<00:01, 16.10it/s]
avg reward: 1.27:  93%|█████████▎| 436/468 [00:24<00:01, 16.10it/s]
avg reward: 1.27:  93%|█████████▎| 436/468 [00:24<00:01, 16.10it/s]
avg reward: 1.27:  93%|█████████▎| 436/468 [00:24<00:01, 16.10it/s]
avg reward: 1.27:  93%|█████████▎| 436/468 [00:24<00:01, 16.10it/s]
avg reward: 1.27:  94%|█████████▍| 440/468 [00:24<00:01, 19.99it/s]
avg reward: 1.28:  94%|█████████▍| 440/468 [00:25<00:01, 19.99it/s]
avg reward: 1.28:  94%|█████████▍| 440/468 [00:25<00:01, 19.99it/s]
avg reward: 1.28:  94%|█████████▍| 440/468 [00:25<00:01, 19.99it/s]
avg reward: 1.28:  95%|█████████▍| 443/468 [00:25<00:01, 17.19it/s]
avg reward: 1.28:  95%|█████████▍| 443/468 [00:25<00:01, 17.19it/s]
avg reward: 1.28:  95%|█████████▍| 443/468 [00:25<00:01, 17.19it/s]
avg reward: 1.28:  95%|█████████▍| 443/468 [00:25<00:01, 17.19it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  95%|█████████▌| 446/468 [00:25<00:01, 15.38it/s]
avg reward: 1.28:  96%|█████████▋| 451/468 [00:25<00:01, 15.65it/s]
avg reward: 1.28:  96%|█████████▋| 451/468 [00:25<00:01, 15.65it/s]
avg reward: 1.28:  96%|█████████▋| 451/468 [00:25<00:01, 15.65it/s]
avg reward: 1.28:  96%|█████████▋| 451/468 [00:25<00:01, 15.65it/s]
avg reward: 1.28:  96%|█████████▋| 451/468 [00:25<00:01, 15.65it/s]
avg reward: 1.28:  97%|█████████▋| 455/468 [00:25<00:00, 19.09it/s]
avg reward: 1.27:  97%|█████████▋| 455/468 [00:26<00:00, 19.09it/s]
avg reward: 1.27:  97%|█████████▋| 455/468 [00:26<00:00, 19.09it/s]
avg reward: 1.27:  97%|█████████▋| 455/468 [00:26<00:00, 19.09it/s]
avg reward: 1.27:  98%|█████████▊| 458/468 [00:26<00:00, 17.11it/s]
avg reward: 1.27:  98%|█████████▊| 458/468 [00:26<00:00, 17.11it/s]
avg reward: 1.27:  98%|█████████▊| 458/468 [00:26<00:00, 17.11it/s]
avg reward: 1.28:  98%|█████████▊| 458/468 [00:26<00:00, 17.11it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28:  99%|█████████▊| 461/468 [00:26<00:00, 15.75it/s]
avg reward: 1.28: 100%|█████████▉| 466/468 [00:26<00:00, 16.65it/s]
avg reward: 1.28: 100%|█████████▉| 466/468 [00:26<00:00, 16.65it/s]
avg reward: 1.28: 100%|██████████| 468/468 [00:26<00:00, 17.53it/s]

  0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9765625:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.97265625:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9765625:   0%|          | 0/79 [00:00<?, ?it/s] 
Val Acc: 0.98046875:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.98125:   0%|          | 0/79 [00:00<?, ?it/s]   
Val Acc: 0.9830729166666666:   0%|          | 0/79 [00:00<?, ?it/s]
Val Acc: 0.9830729166666666:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]
Val Acc: 0.9821428571428571:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]
Val Acc: 0.982421875:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]       
Val Acc: 0.9826388888888888:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]
Val Acc: 0.9828125:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]         
Val Acc: 0.9829545454545454:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]
Val Acc: 0.982421875:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]       
Val Acc: 0.9807692307692307:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]
Val Acc: 0.98046875:   8%|▊         | 6/79 [00:00<00:01, 55.13it/s]        
Val Acc: 0.98046875:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.9802083333333333:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.9814453125:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]      
Val Acc: 0.9811580882352942:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.9796006944444444:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.9802631578947368:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.980078125:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]       
Val Acc: 0.9799107142857143:  18%|█▊        | 14/79 [00:00<00:01, 63.44it/s]
Val Acc: 0.9799107142857143:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.9801136363636364:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.9802989130434783:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.98046875:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]        
Val Acc: 0.9809375:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s] 
Val Acc: 0.9810697115384616:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.9811921296296297:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.9818638392857143:  27%|██▋       | 21/79 [00:00<00:00, 65.74it/s]
Val Acc: 0.9818638392857143:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9816810344827587:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9815104166666667:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9818548387096774:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.981689453125:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]    
Val Acc: 0.9817708333333334:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9816176470588235:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9816964285714286:  35%|███▌      | 28/79 [00:00<00:00, 62.58it/s]
Val Acc: 0.9816964285714286:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9817708333333334:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9818412162162162:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9819078947368421:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9817708333333334:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9818359375:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]      
Val Acc: 0.9818978658536586:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9817708333333334:  44%|████▍     | 35/79 [00:00<00:00, 63.47it/s]
Val Acc: 0.9817708333333334:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9818313953488372:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9815340909090909:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9815972222222222:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9816576086956522:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9817154255319149:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9816080729166666:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9818239795918368:  53%|█████▎    | 42/79 [00:00<00:00, 62.30it/s]
Val Acc: 0.9818239795918368:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.98171875:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]        
Val Acc: 0.9813112745098039:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9810697115384616:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9811320754716981:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9810474537037037:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9813920454545455:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9811662946428571:  62%|██████▏   | 49/79 [00:00<00:00, 63.60it/s]
Val Acc: 0.9811662946428571:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9814967105263158:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9814116379310345:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9813294491525424:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9815104166666667:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9810450819672131:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9810987903225806:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9811507936507936:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]
Val Acc: 0.9813232421875:  71%|███████   | 56/79 [00:00<00:00, 63.49it/s]   
Val Acc: 0.9813232421875:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.9814903846153846:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.9811789772727273:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.9812266791044776:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.9811580882352942:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.9812047101449275:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.98125:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]           
Val Acc: 0.981294014084507:  81%|████████  | 64/79 [00:01<00:00, 65.72it/s]
Val Acc: 0.981294014084507:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.9814453125:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]     
Val Acc: 0.9814854452054794:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.981418918918919:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s] 
Val Acc: 0.9813541666666666:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.9812911184210527:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.9813311688311688:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.9815705128205128:  90%|████████▉ | 71/79 [00:01<00:00, 65.13it/s]
Val Acc: 0.9815705128205128:  99%|█████████▊| 78/79 [00:01<00:00, 64.16it/s]
Val Acc: 0.9815: 100%|██████████| 79/79 [00:01<00:00, 64.32it/s]
epoch 0, val_acc: 0.98, avg reward: 1.28: 100%|██████████| 1/1 [00:27<00:00, 27.97s/it]

The resulting architecture is:

mynet.graph
../../_images/output_enas_proxylessnas_ad55b9_25_0.svg

Store the trained model as a static network

The trained ENAS network can be saved to disk for future inferences.

mynet.export('enas')
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/extra/src/autogluon/extra/contrib/enas/enas.py:337: UserWarning: "ENAS_Sequential._modules" is an unregistered container with Blocks. Note that Blocks inside the list, tuple or dict will not be registered automatically. Make sure to register them using register_child() or switching to nn.Sequential/nn.HybridSequential instead.
  for name, param in self.collect_params().items():

Load it back with mxnet:

mynet_static = mx.gluon.nn.SymbolBlock.imports("enas-symbol.json", ['data'], "enas.params")
y = mynet_static(mx.nd.zeros((1, 1, 28, 28)))
print(y.shape)
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-nas-v3/venv/lib/python3.7/site-packages/mxnet/gluon/block.py:1512: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
    data: None
  input_sym_arg_type = in_param.infer_type()[0]
(1, 10)

Reference

[1] Efficient Neural Architecture Search via Parameter Sharing H Pham, MY Guan, B Zoph, QV Le, J Dean International Conference on Machine Learning (ICML)

[3] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware Han Cai, Ligeng Zhu, Song Han International Conference on Learning Representations (ICLR), 2019.