Towards Understanding Masked Distillation

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: representation learning, computer vision
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Explain how Masked Distillation (a variant of Masked Image Modeling) improves the model performance.
Abstract: In the realm of self-supervised learning, Masked Image Modeling (MIM) serves as a viable approach for mitigating the dependency on large-scale annotated data, while demonstrating efficacy across a broad spectrum of downstream tasks. A recent variant of MIM known as Masked Distillation (MD) has emerged, which utilizes semantic features instead of low-level features as the supervision. Although prior work has demonstrated its effectiveness in various downstream tasks, the underlying mechanisms for its performance improvements remain unclear. Our investigation reveals that Masked Distillation mitigates multiple forms of overfitting present in the original models, including but not limited to attention homogenization and the representation folding of high layers. Further, we uncover that Masked Distillation introduces beneficial inductive biases stemming from MIM, which are believed to contribute positively to model performance. We also analyze the nuances of the model architecture design and decision-making tendencies in Masked Distillation, revealing inconsistencies with previous research findings.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1920
Loading