Keywords: deep RL, deep learning, normalization, gradient interference, reinforcement learning, RL, off-policy RL, offline RL
Abstract: Layer normalization (LN) is among the most effective normalization schemes for deep $Q$-learning. However, its benefits remain not fully understood. We study LN through the lens of _gradient interference_. A gradient interference metric used in prior works is the inner product between semi-gradients of the temporal difference error on two random samples. We argue that, from the perspective of minimizing the loss, a more principled metric is to calculate the inner product between a semi-gradient and a full-gradient. We test this argument with offline deep $Q$-learning, without a target network, on four classic control tasks. However, counterintuitively, we find empirically that first-order gradient interference metrics _positively_ correlate with the training loss. We empirically show that adding a second-order gradient interference term gives more intuitive results. Theoretically, we provide supporting arguments from the linear regression setting.
Primary Area: reinforcement learning
Submission Number: 21858
Loading