Deep Neural Networks without Normalization

13 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Normalization, Deep Neural Networks
TL;DR: We introduce an element-wise operation DyT to replace normalization layers in deep neural networks without compromising performance or training stability.
Abstract: Normalization layers are ubiquitous in modern neural networks and have long been considered essential. In this work, we demonstrate that we can achieve strong performance without them, using a remarkably simple technique. We introduce Dynamic Tanh (DyT), an element-wise operation: $\mathrm{DyT}(\mathbf {x}) = \tanh(\alpha \mathbf {x}),$ as a drop-in replacement to normalization layers (e.g., layer normalization). DyT is directly inspired by the simple observation that normalization layers produce tanh-like, S-shaped curves for their input-output mappings. With DyT, networks without normalization layers could match or exceed the performance of their normalization counterparts, while keeping all other training hyperparameters intact. Experiments across diverse settings validate this, ranging from recognition to generation, ConvNets to LLMs, and supervised to self-supervised learning. Our findings challenge the conventional understanding that normalization layers are indispensable, and provide new insights into their workings.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 406
Loading