Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: self-supervised learning, contrastive learning, learning dynamics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper shows that feature normalization makes the eigenmode dynamics of non-contrastive learning sixth-order and prevents it from collapse.
Abstract: Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning represented by BYOL and SimSiam further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point without the repulsive force at first sight, \cite{Tian2021ICML} revealed that non-collapse solutions are possible if data augmentation is sufficiently stronger than regularization, through the study of the learning dynamics. However, their analysis does not take into account of commonly-used \emph{feature normalization}, and hence strong regularization may collapse the dynamics. Instead of the L2 loss used in \cite{Tian2021ICML}, we extend their analysis by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces a sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there is only a collapsed solution with given initial parameters. Thus, we can understand that feature normalization plays an important role in preventing the dynamics collapse.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3290
Loading