Keywords: representation learning, contrastive learning, optimization
Abstract: Contrastive self-supervised learning has emerged as a powerful paradigm for extracting meaningful representations without labels. While effective at capturing broad categorical distinctions, current methods often struggle to preserve the fine-grained and hierarchical relationships inherent in real-world data. From the perspective of semantic alignment, conventional contrastive learning aligns representations to semantic structure at a global level, treating the entire embedding space uniformly and frequently overlooking rich local structural information. In this paper, we propose \emph{Adaptive Multi-scale Affinity alignment (AMA-alignment)}, a framework that introduces localized contrastive objectives and a dynamic multi-scale optimization strategy to adaptively identify and refine poorly aligned regions within the embedding space. Although our model is inherently more complex due to its \emph{multi-scale} and \emph{adaptive} design, we provide the theoretical guarantees indicating that its convergence rate remains comparable to that of standard smooth non-convex optimization. We conduct a set of experiments on diverse benchmarks to show that AMA-alignment can effectively preserve hierarchical structure; moreover, AMA-alignment also outperforms existing contrastive methods on a range of downstream tasks.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 20850
Loading