Batch normalized recurrent neural networks

César Laurent, Gabriel Pereyra, Philemon Brakel, Ying Zhang, Yoshua Bengio

2016 (modified: 30 Jan 2025)ICASSP 2016Readers: Everyone

Abstract: Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feed-forward neural networks [1]. In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to significantly reduce training time. In this paper, we investigate how batch normalization can be applied to RNNs. We show for both a speech recognition task and language modeling that the way we apply batch normalization leads to a faster convergence of the training criterion but doesn't seem to improve the generalization performance.

0 Replies