Variance-Covariance Regularization Improves Representation Learning

Jiachen Zhu; Ravid Shwartz-Ziv; Yubei Chen; Yann LeCun

Variance-Covariance Regularization Improves Representation Learning

Jiachen Zhu, Ravid Shwartz-Ziv, Yubei Chen, Yann LeCun

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Representation Learning, Transfer Learning, Regularization

TL;DR: By regularizing the variance and covariance of the hidden representation, the network could learn a robust representation with improved transfer learning performance

Abstract: Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. Recent progress in self-supervised learning (SSL) has introduced regularization techniques that bolster feature transferability. In this work, we adapt an SSL regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg). This adaptation encourages the network to learn a high-variance, low-covariance representation, promoting the learning of more diverse features. We outline best practices for implementing this regularization framework into various neural network architectures and present an optimized strategy for regularizing intermediate representations. Through extensive empirical evaluation, we demonstrate that our method significantly enhances transfer learning, achieving excellent performance across numerous tasks and datasets. VCReg also improves performance in scenarios like long-tail learning, and hierarchical classification. Additionally, we conduct analyses to suggest that its effectiveness may stem from its success in addressing challenges like gradient starvation and neural collapse. In summary, VCReg offers a universally applicable regularization framework that significantly advances the state of transfer learning, highlights the connection between gradient starvation, neural collapse, and feature transferability, and potentially opens a new avenue for regularization in this domain.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6297

Loading