Keywords: masked autoencoder, multiple masking, masked image modeling
Abstract: The performance of masked autoencoders hinges significantly on masking, prompting considerable efforts towards devising superior masking strategies. However, these strategies mask only once and employ masking directly on the input image. Afterward, inspired by the flexibility of masking, subsequent works introduce two rounds of masking. Unfortunately, all initiatives primarily focus on enhancing model performance, lacking an in-depth and systematical understanding of multiple masking for masked autoencoder. To bridge this gap, this work introduce a masked framework with multiple masking stages, termed Conditional MAE, where subsequent maskings are conditioned on previous unmasked representations, enabling a more flexible masking process in masked image modeling. By doing so, our study sheds light on how multiple masking affects the optimization in training and performance of pretrained models, e.g., introducing more locality to models, and summarizes several takeaways from our findings. Finally, we empirically evaluate the performance of our best-performing model(Conditional-MAE) with that of MAE in three folds including transfer learning, robustness, and scalability, demonstrating the effectiveness of our multiple masking strategy. We also follow our takeaways and show the generalizability to other heterogeneous networks including SimMIM and ConvNeXt V2. We hope our findings will inspire further work in the field and release the code at https:
//anonymous.4open.science/r/conditional-mae-512C.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2789
Loading