Why do Features of Multi-Layer Perceptrons Condense in Training?

17 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Neural Networks, Deep Learning Theory, Multi-Layer Perceptrons
TL;DR: In this paper, we explain the essential mechanism for the feature condensation in the early training of multi-layer perceptrons.
Abstract: This paper focuses on the problem of feature condensation in early epochs of learning multi-layer perceptrons (MLPs). In fact, the feature condensation is related to many other phenomena in deep learning, and people have some empirical operations to avoid these problems. However, current studies do not well explain essential mechanisms that lead to the feature condensation, i.e., which factors will determine (or alleviate) the feature condensation. The explanation of determinants of feature condensation is crucial for both theoreticians and practitioners. To this end, we theoretically analyze the learning dynamics of MLPs, which clarifies how four typical operations (including batch normalization, momentum, weight initialization, and $L_2$ regularization) affect the feature condensation.
Supplementary Material: zip
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 784
Loading