How Does Layer Normalization Improve Deep $\boldsymbol{Q}$-learning?

How Does Layer Normalization Improve Deep $\boldsymbol{Q}$-learning?

ICLR 2026 Conference Submission21858 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep RL, deep learning, normalization, gradient interference, reinforcement learning, RL, off-policy RL, offline RL

Abstract: Layer normalization (LN) is among the most effective normalization schemes for deep $Q$-learning. However, its benefits remain not fully understood. We study LN through the lens of _gradient interference_. A gradient interference metric used in prior works is the inner product between semi-gradients of the temporal difference error on two random samples. We argue that, from the perspective of minimizing the loss, a more principled metric is to calculate the inner product between a semi-gradient and a full-gradient. We test this argument with offline deep $Q$-learning, without a target network, on four classic control tasks. However, counterintuitively, we find empirically that first-order gradient interference metrics _positively_ correlate with the training loss. We empirically show that adding a second-order gradient interference term gives more intuitive results. Theoretically, we provide supporting arguments from the linear regression setting.

Primary Area: reinforcement learning

Submission Number: 21858

Loading