Keywords: Self-Supervised Learning, local-global representation learning, bootstrapping representation, Denoising Diffusion Model, Contrastive Learning, Masked Image Modeling
TL;DR: BYON: bootstrapping from noise to couple diffusion and contrastive alignment for transferable representations.
Abstract: We introduce Bootstrap Your Own Noise (BYON), a self-supervised pretraining framework that unifies denoising diffusion with uncertainty-guided contrastive learning to enhance both local and global feature representations. BYON forms a self-reinforcing loop: contrastive learning improves reconstruction quality, and in turn, improved reconstructions refine feature alignment. A Semantic Uncertainty Estimation (SUE) module adaptively reweights contrastive updates based on reconstruction quality, while an Image-specific Adaptive Noise (IAN) adaptively modulates the noise intensity at the image level based on token saliency, perturbing more informative images more strongly.
BYON consistently boosts performance on image classification, semantic segmentation, object detection, instance segmentation, and fine-grained visual classification (FGVC) tasks. To ensure reproducibility, the code is available in the Supplementary material.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16281
Loading