Abstract: Self-supervised learning (SSL) aims to learn the intrinsic features of data without labels. Despite the diverse SSL architectures, the projection head always
plays an important role in improving downstream
task performance. In this study, we systematically
investigate the role of the projection head in SSL.
We find that the projection head targets the uniformity aspect, which maps samples into uniform distribution and enables the encoder to focus on extracting semantic features. Drawing on this insight, we
propose a Representation Evaluation Design (RED)
in SSL models in which a shortcut connection between the representation and the projection vectors
is built. Our extensive experiments with different
architectures (including SimCLR, MoCo-V2, and
SimSiam) on various datasets demonstrate that the
RED-SSL consistently outperforms their baseline
counterparts in downstream tasks. Furthermore, the
RED-SSL learned representations exhibit superior
robustness to previously unseen augmentations and
out-of-distribution data.
Loading