Emergence of Alignment and Local Elasticity in Two-Layer Neural Networks

HEE BIN YOO; Sungyoon Lee; Cheongjae Jang; Dong-Sig Han; Jaein Kim; Seunghyeon Lim; Byoung-Tak Zhang

Emergence of Alignment and Local Elasticity in Two-Layer Neural Networks

HEE BIN YOO, Sungyoon Lee, Cheongjae Jang, Dong-Sig Han, Jaein Kim, Seunghyeon Lim, Byoung-Tak Zhang

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: two-layer neural network, feature learning, metric learning, local elasticity, retrieval, random matrix theory

TL;DR: We extend the Conjugate Kernel framework to identify conditions on unseen data distributions that induce local elasticity and clustering, providing a unified theory for feature learning and metric learning in the proportional regime.

Abstract: Investigating phenomena such as Alignment and Local Elasticity is essential for understanding feature space of Neural Networks and enhancing performance across a wide range of tasks. In this context, we investigate the emergence of these phenomena in two-layer neural networks performing a classification task. This paper reveals Alignment and Local Elasticity emergence condition after one step of training are identical. In particular, we demonstrate that intra-class features are more aligned when the inner product of their mean and the covariance of the training data-label \ie \textit{train-unseen similarity} is large, with stronger Local Elasticity occurring under this condition. We validate our theory through experiments with a two-layer network showing that both Alignment and Local Elasticity improve as the train-unseen similarity increases. Furthermore, we claim that our analysis provides both theoretical and practical insights into the relationship between train-unseen similarity, alignment, and the improvement of clustering performance on unseen data for neural networks trained on similar domain data. This is supported by experiments, including a multi-layer CNN setup and detailed discussions. Specifically, we show that higher train-unseen similarity improves Recall@1 in two-layer networks and that Alignment and Recall@1 exhibit a positive correlation in metric learning. We also present novel techniques for deriving operator norm bounds of non-centered Sub-Gaussian matrices, extending conventional regression analysis with standard Gaussian assumptions to the binary classification setting.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4780

Loading