Keywords: Masked Image Modeling, Masked Autoencoders, Representation Learning, Mutual Information, Retinal Imaging, Medical Imaging
Abstract: We propose a frequency-oriented perspective on retinal representation learning by analyzing masked autoencoders (MAE) through the lens of spatial frequency. Our analysis shows that MAE favors low-frequency content while under-encoding diagnostically critical high-frequency structures in retinal images. Because retinal pathology often manifests in high-frequency detail, this bias limits diagnostic performance and motivates frequency-balanced representations. Within a mutual-information (MI) formulation of MAE, we introduce the \emph{Frequency-Balanced Retinal Masked Autoencoder (RetMAE)}, which augments the reconstruction objective with a MI regularizer that suppresses low-frequency redundancy and accentuates clinically salient high-frequency information. Without altering architecture, RetMAE learns frequency-balanced features that surpass those of MAE-based retinal encoders in both quantitative and qualitative evaluations. These results suggest that a frequency-oriented view provides a principled foundation for future advances in ophthalmic modeling.
offering new insight into how MAE’s reconstruction objective amplifies ViT’s low-pass tendencies in spatially heterogeneous retinal images and enabling a simple MI-based correction that improves retinal encoders.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 5485
Loading