Abstract: The dimensional collapse of representations in self-supervised learning is an ever-present issue. One notable technique to prevent such collapse of representations is using a multi-layered perceptron network called Projector. In several works, the projector has been found to heavily influence the quality of representations learned in a self-supervised pre-training task. However, the question still lingers. What role does the projector play? If it does prevent the collapse of representations, then why doesn’t the last layer of the encoder take up the role of projector in the absence of an MLP one? In this work, we intend to study what happens inside the projector by examining the rank dynamics of the same and the encoder through empirical study and analyses. Through mathematical analysis, we observe that the effect of rank reduction predominantly occurs in the last layer. Furthermore, we show that applying weight regularization only in the last layer yields better performance than when used on the whole network (WeRank), both with and without a projector. Empirical results justify that our interpretation of the role of the projector is correct.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: **Changes**:
- Changed Proposition 2 and Key Takeaway 2 as per the suggestions from Reviewers ajvU and ZcmN
- Added empirical evidence for Proposition 1
- Improved Proposition 3 for better understanding as per the suggestion from the Reviewer 5LgN
- Added results for ImageNet100
- Added theoretical setup
- Added table for notations
- Added a separate subsection for the definition of several terms used in the paper for better understanding.
- Added a few more papers in the literature review
Assigned Action Editor: ~Yuan_Cao1
Submission Number: 5321
Loading