Keywords: Self-Supervised Learning, Multi-View Self-Supervised Learning, Joint-Embedding Self-Supervised Learning, High Dimensional Probability, Information theory, Double Descent, Scaling Laws, Machine Learning
TL;DR: Understanding and using a new multiview SSL method called Maximum Manifold Capacity Representations
Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting because it does not fit neatly into any of the commonplace MVSSL families, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. We seek to better understand and then better utilize MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective often discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. Broadly, by more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals powerful insights on improving MVSSL methods.
Submission Number: 57
Loading