Bridging the Gap between Supervised and Self-supervised Contrastive Learning

Bridging the Gap between Supervised and Self-supervised Contrastive Learning

TMLR Paper4313 Authors

22 Feb 2025 (modified: 22 May 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Compared to supervised learning, self-supervised learning has progressed more empirically than theoretically. Many successful algorithms combine multiple techniques that are supported by experiments. While there are some theoretical works, few have explicitly formulated its connection to supervised learning. To address this gap, we take a principled approach. We theoretically formulate a self-supervised learning problem as an approximation of a supervised learning problem in the context of contrastive learning. From the formulated problem, we derive a loss that is closely related to existing contrastive losses, thereby providing a foundation for these losses. The concepts of prototype representation bias and balanced contrastive loss are naturally introduced in the derivation, which provide insights to help understand self-supervised learning. We discuss how components of our framework align with practices of self-supervised learning algorithms, focusing on SimCLR. We also investigate the impact of balancing the attracting force between positive pairs and the repelling force between negative pairs. The proofs of our theorems are provided in the appendix, and the code to reproduce experimental results is provided in the supplementary material.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Han_Bao2

Submission Number: 4313

Loading