Keywords: Self-Supervised Learning, representation learning, nonlinear learning dynamics
TL;DR: We empirically support a theoretical analysis of the nonlinear dynamics of Self-Supervised learning methods without contrastive pairs
Abstract: Scope of Reproducibility
Tian et al. claim in "Understanding Self-Supervised Learning Dynamics without Contrastive Pairs" that with the underlying learning dynamics of BYOL and SimSiam, a new method \emph{DirectPred} can be derived. We investigate the assumptions made for this derivation and also compare the quality of the produced encoder representations through linear probing of these networks.
Methodology
We reimplemented BYOL, SimSiam and DirectPred from scratch as well as their ablations in TensorFlow. We checked the original repository in written PyTorch for some implementation details. In all experiments, we used the CIFAR-10 train set for training and the test set for evaluation. We were running our experiments for more than 100 hours on GCP's V100 GPU.
Results
We show that the theoretical assumption regarding eigenspace alignment and symmetry holds also for a different dataset other than the one used in the original paper. In addition, we reproduce ablations regarding learning rate, weight decay and Exponential Moving Average.
Since we used CIFAR-10 in all experiments we can not directly compare accuracies. However, we show the same relative behaviour of different networks given hyperparameter changes. We can directly compare performance for one of the experiments and our models, namely SGD Baseline, DirectPred (with and without frequency=5), achieve comparable accuracy which differ by at most 1%. We also confirm the claim that DirectPred outperforms its one-layer SGD alternative. Our code can be accessed under the following link: https://anonymous.4open.science/r/SelfSupervisedLearning-FD0F.
What was easy
The architecture of the Siamese network and training schemes were both straightforward to implement and easy to understand.
What was difficult
We could not run our code on STL-10 dataset due to time and resource constraints.
Due to differences between PyTorch and TensorFlow libraries, we had to implement some parts by hand to keep our code as close to the original work as possible. Also, the original repository is not easy to read and does not cover all the experiments (e.g. eigenspace alignment experiment). Correctly applying data-augmentation was also a hard task due to assumptions of how the individual data augmentation functions actually work.
Communication with original authors
We did not contact the authors of the paper since we did not encounter any major issues during the reproducibility study.
Paper Url: http://proceedings.mlr.press/v139/tian21a/tian21a.pdf
Paper Venue: ICML 2021
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/understanding-self-supervised-learning/code)
4 Replies
Loading