Recurrent Normalization PropagationDownload PDF

29 Apr 2025 (modified: 21 Jul 2022)ICLR 2017 Invite to WorkshopReaders: Everyone
TL;DR: Extension of Normalization Propagation to the LSTM.
Abstract: We propose an LSTM parametrization that preserves the means and variances of the hidden states and memory cells across time. While having training benefits similar to Recurrent Batch Normalization and Layer Normalization, it does not need to estimate statistics at each time step, therefore, requiring fewer computations overall. We also investigate the parametrization impact on the gradient flows and present a way of initializing the weights accordingly. We evaluate our proposal on language modelling and image generative modelling tasks. We empirically show that it performs similarly or better than other recurrent normalization approaches, while being faster to execute.
Keywords: Deep learning, Optimization
Conflicts: umontreal.ca
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview