Mini-Batch Optimization of Contrastive Loss

Jaewoong Cho; Kartik Sreenivasan; Keon Lee; Kyunghoo Mun; Soheun Yi; Jeong-Gwan Lee; Anna Lee; Jy-yong Sohn; Dimitris Papailiopoulos; Kangwook Lee

Mini-Batch Optimization of Contrastive Loss

Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

Published: 17 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Contrastive learning has gained significant attention as a pre-training method for self-supervised learning due to its ability to leverage large amounts of unlabeled data. A contrastive loss function ensures that embeddings of positive sample pairs (e.g., from the same class or different views of the same data) are similar, while embeddings of negative pairs are dissimilar. However, practical constraints such as large memory requirements make it infeasible to consider all possible positive and negative pairs, leading to the use of mini-batches. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning with the InfoNCE loss. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD, providing a better understanding of mini-batch optimization in contrastive learning.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/krafton-ai/mini-batch-cl

Supplementary Material: zip

Assigned Action Editor: ~Yannis_Kalantidis2

Submission Number: 2201

Loading