HyperBatch: Scaling Contrastive Learning Batch Sizes by Two Orders of Magnitude

ICLR 2026 Conference Submission22345 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Contrastive learning, Large-batch distributed training, Parameter-efficient Adapters
Abstract: Contrastive learning has emerged as a powerful method for learning unsupervised representations of data that maximize similarity between "related" pairs of data and minimize similarity between unrelated pairs. Many contrastive losses depend heavily on the batch size, as larger batch sizes significantly improve model intelligence. However, modern backbones are memory intensive and limit the practical batch size one can train with. To alleviate this issue, we introduce a new framework to scale contrastive batch sizes by two orders of magnitude. This allows us to improve the performance of any contrastive learner. Our training framework consists of three phases—Pretrain, Adapt, and Fuse. In the Pretrain phase, we train a standard contrastive learner with conventional batch sizes. In the Adapt phase, we freeze the backbone and train a small number of later layers with very large batches, exposing these late-stage parameters to significantly larger batches and accelerated training. Finally, in the Fuse phase, we transfer large-batch adapter gradients back into the backbone with a modified version of backpropagation. We evaluate methods with audio-video contrastive learning on the Audioset dataset. We show that our multi-phase training pipeline significantly improves retrieval performance and outperforms baseline approaches in both speed and accuracy. By exposing the model to substantially more negatives we make each contrastive judgment orders of magnitude more challenging, encouraging models to develop more sophisticated and intelligent representations.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 22345
Loading