Keywords: self supervised learning, theory, origins, supervised learning, generalization
Abstract: Self Supervised Learning (SSL) produces versatile representations from unlabeled datasets, while supervised learning produces overly specialized representations from labeled datasets.
While this has been {\em empirically observed} many times, it remains to be {\em theoretically explained}. To that end, we bring forward a {\em supervised theory of SSL}: we prove that (i) the training objective of supervised and self supervised learning are identical, but (ii) they use different labeling of the data. While supervised learning operates on {\em explicitly given} task labels, SSL operates on {\em implicitly defined} labels that maximize the worst-case downstream task performance. As such, the observed benefit of SSL for downstream task generalization stems from the labels being used as targets, rather than its loss function. In other words, both SSL and supervised learning can be made specialized or versatile solely by varying the training labels.
Our proofs and findings only rely on minimal assumptions thus providing numerous practical insights. For example, we demonstrate how different constraints put on the supervised learning classifier head and label imbalance equate to different SSL objectives such as VICReg, opening new doors to actively modify them based on a priori knowledge on the data distribution.
Submission Number: 72
Loading