Keywords: Reinforcement Learning, Deep Reinforcement Learning, Value based, Batch Size, Multi step learning
TL;DR: We perform an exhaustive investigation into the interplay of batch size and update horizon and uncover a surprising phenomenon: when increasing the update horizon, it is more beneficial to decrease the batch size
Abstract: We present a surprising discovery: in deep reinforcement learning, decreasing the batch size during training can dramatically improve the agent's performance when combined with multi-step learning. Both reducing batch sizes and increasing the update horizon increase the variance of the gradients, so it is quite surprising that increased variance on two fronts yields improved performance. We perform a wide range of experiments to gain a better understanding of this phenomenon, which we denote variance double-down.
4 Replies
Loading