- Abstract: Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence. However, the calculation of mean and variance statistics used for normalization is sequential and time-consuming, which significantly slows the underlying network, e.g. recurrent neural network in particular. In this paper, we propose root mean square layer normalization, or RMSNorm, to reduce the normalization computation while keeping the stabilization capability. Unlike LayerNorm, RMSNorm regularizes the summed inputs to a neuron within one layer according to root mean square, giving the model invariance properties but avoiding sequentially calculating the mean and variance statistics so as to be faster. We conducted extensive experiments on both language-based and image-based tasks with diverse network architectures. The results show that RMSNorm achieves comparable performance, but reduces the running time by 2.5%∼15% on different models against LayerNorm.
- Keywords: layer normalization, root mean square
- TL;DR: RMSNorm