Unsupervised Training of Vision Transformers with Synthetic Negatives

Published: 07 May 2025, Last Modified: 29 May 2025VisCon 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-supervised learning, Vision transformers, Synthetic negatives, Contrastive learning, Representation learning, Hard negatives
TL;DR: We improve vision transformer representations by incorporating synthetic hard negatives into contrastive learning, improving performance without additional overhead.
Abstract: This paper does not introduce a novel method per se. Instead, we address the neglected potential of hard negative samples in self-supervised learning. Previous works explored synthetic hard negatives but rarely in the context of vision transformers. We build on this observation and integrate synthetic hard negatives to improve vision transformer representation learning. This simple yet effective technique notably improves the discriminative power of learned representations. Our experiments show performance improvements for both DeiT-S and Swin-T architectures.
Submission Number: 16
Loading