Enjoy Your Layer Normalization with the Computation Efficiency of RMSNorm

Yuxin Guo; Yizhou Ruan; Yihao Yue; Yunhao Ni; Lei Huang

Enjoy Your Layer Normalization with the Computation Efficiency of RMSNorm

Yuxin Guo, Yizhou Ruan, Yihao Yue, Yunhao Ni, Lei Huang

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Layer normalization, RMSNorm, Deep Learning

Abstract: Layer normalization (LN) is a milestone technique in deep learning and has been widely used in various network architectures. It performs centering and scaling over the layer activations of a neural network for each example, stabilizing and accelerating the training of neural network. However, it introduces extra computation cost during inference and the computation problem has recently been addressed by its counterpart RMSNorm that only adopts scaling. This paper investigates how to exploit the theoretical advantages of LN but with the cost of RMSNorm. This paper formally defines the condition that the centering operation of LN can be removed and this condition can be obtained by imposing the column centering constraint on the adjacent linear module before the LN. We propose column centered weight transformation (CCWT) to ensure an LN without centering operation (i.e., RMSNorm) have the same output as the original one in a pre-trained model. Our method can be directly applied to various pre-trained large language models (LLMs) and large vision language models (VLMs) with LN, enabling an immediate reduction in computation cost meanwhile maintaining equivalent prediction during inference. We further propose a reparameterization method, called column based weight centering (CBWC), to ensure the linear module column centered during training. We show that RMSNorm combining CBWC can obtain an equivalent effects to the LN counterpart during training, but with more efficient computation.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10445

Loading