Bootstrap Your Own Noise: Denoising Adaptive Noise in Diffusion Models for SSL

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-Supervised Learning, local-global representation learning, bootstrapping representation, Denoising Diffusion Model, Contrastive Learning, Masked Image Modeling
TL;DR: BYON: bootstrapping from noise to couple diffusion and contrastive alignment for transferable representations.
Abstract: We introduce Bootstrap Your Own Noise (BYON), a self-supervised pretraining framework that unifies denoising diffusion with uncertainty-guided contrastive learning to enhance both local and global feature representations. BYON forms a self-reinforcing loop: contrastive learning improves reconstruction quality, and in turn, improved reconstructions refine feature alignment. A Semantic Uncertainty Estimation (SUE) module adaptively reweights contrastive updates based on reconstruction quality, while an Image-specific Adaptive Noise (IAN) adaptively modulates the noise intensity at the image level based on token saliency, perturbing more informative images more strongly. BYON consistently boosts performance on image classification, semantic segmentation, object detection, instance segmentation, and fine-grained visual classification (FGVC) tasks. To ensure reproducibility, the code is available in the Supplementary material.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16281
Loading