i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?Download PDF

22 Sept 2022 (modified: 12 Mar 2024)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Interpretability, Masked Autoencoders, Self-supervised Learning
Abstract: Masked image modeling (MIM) has been recognized as a strong and popular self-supervised pre-training approach in the vision domain. However, the interpretability of the mechanism and properties in the learned representations by such a scheme is so far not well explored. In this work, through comprehensive experiments and empirical studies on Masked Autoencoders (MAE), we address two critical questions to explore the behaviors of the learned representations: ${\bf (i)}$ Are the latent representations in Masked Autoencoders linearly separable if the input is a mixture of two images instead of a single one? This can be concrete evidence to explain why MAE-learned representations have superior performance on downstream tasks, as proven by many literatures impressively. ${\bf (ii)}$ What is the degree of semantics encoded in the latent feature space by Masked Autoencoders? To explore these two problems, we propose a simple yet effective Interpretable MAE (${\bf i-MAE})$ framework with a two-way image reconstruction and a latent feature reconstruction with distillation loss, to help us understand the behaviors inside MAE structure. Extensive experiments are conducted on CIFAR-10/100, Tiny-ImageNet and ImageNet-1K datasets to verify the observations we discovered. Furthermore, in addition to qualitatively analyzing the characteristics in the latent representations, we also examine the existence of linear separability and the degree of semantics in the latent space by proposing two novel metrics. The surprising and consistent results between the qualitative and quantitative experiments demonstrate that i-MAE is a superior framework design for interpretability research of MAE frameworks, as well as achieving better representational ability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2210.11470/code)
5 Replies

Loading