Mini-Batch Optimization of Contrastive Loss

TMLR Paper2201 Authors

15 Feb 2024 (modified: 29 Feb 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: Contrastive learning has gained significant attention as a pre-training method for self-supervised learning due to its ability to leverage large amounts of unlabeled data. A contrastive loss function ensures that embeddings of positive sample pairs (e.g., from the same class or different views of the same data) are similar, while embeddings of negative pairs are dissimilar. However, practical constraints such as large memory requirements make it infeasible to consider all possible positive and negative pairs, leading to the use of mini-batches. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning with the InfoNCE loss. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD, providing a better understanding of mini-batch optimization in contrastive learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yannis_Kalantidis2
Submission Number: 2201
Loading