Recurrent Batch Normalization

Tim Cooijmans; Nicolas Ballas; César Laurent; Çağlar Gülçehre; Aaron Courville

Recurrent Batch Normalization

Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville

Published: 06 Feb 2017, Last Modified: 22 Jun 2025ICLR 2017 PosterReaders: Everyone

Abstract: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

TL;DR: Make batch normalization work in recurrent neural networks

Conflicts: umontreal.ca, google.com

Keywords: Deep learning, Optimization

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/recurrent-batch-normalization/code)

16 Replies

Loading