The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning

Johan Samir Obando Ceron; Marc G Bellemare; Pablo Samuel Castro

The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning

Johan Samir Obando Ceron, Marc G Bellemare, Pablo Samuel Castro

01 Mar 2023 (modified: 11 Apr 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone

Keywords: Reinforcement Learning, Deep Reinforcement Learning, Value based, Batch Size, Multi step learning

TL;DR: We perform an exhaustive investigation into the interplay of batch size and update horizon and uncover a surprising phenomenon: when increasing the update horizon, it is more beneficial to decrease the batch size

Abstract: We present a surprising discovery: in deep reinforcement learning, decreasing the batch size during training can dramatically improve the agent's performance when combined with multi-step learning. Both reducing batch sizes and increasing the update horizon increase the variance of the gradients, so it is quite surprising that increased variance on two fronts yields improved performance. We perform a wide range of experiments to gain a better understanding of this phenomenon, which we denote variance double-down.

4 Replies

Loading