Keywords: Contrastive Learning, Simulated Annealing, Langevin Dynamics, Temperature Schedules, Stochastic Optimization, Representation Learning, Machine Learning Theory
TL;DR: We establish a formal link between temperature annealing in InfoNCE and classical simulated annealing, proving that a slow logarithmic temperature schedule guarantees convergence to globally optimal representations.
Abstract: The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain poorly understood. We provide a theoretical analysis by modeling embedding evolution under Langevin dynamics on a compact Riemannian manifold. Under mild smoothness and energy-barrier assumptions, we show that classical simulated annealing guarantees extend to this setting: slow logarithmic inverse-temperature schedules ensure convergence in probability to a set of globally optimal representations, while faster schedules risk becoming trapped in suboptimal minima. Our results establish a link between contrastive learning and simulated annealing, providing a principled basis for understanding and tuning temperature schedules.
Submission Number: 15
Loading