Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization

Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization

ICLR 2026 Conference Submission5485 Authors

15 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Masked Image Modeling, Masked Autoencoders, Representation Learning, Mutual Information, Retinal Imaging, Medical Imaging

Abstract: We propose a frequency-oriented perspective on retinal representation learning by analyzing masked autoencoders (MAE) through the lens of spatial frequency. Our analysis shows that MAE favors low-frequency content while under-encoding diagnostically critical high-frequency structures in retinal images. Because retinal pathology often manifests in high-frequency detail, this bias limits diagnostic performance and motivates frequency-balanced representations. Within a mutual-information (MI) formulation of MAE, we introduce the \emph{Frequency-Balanced Retinal Masked Autoencoder (RetMAE)}, which augments the reconstruction objective with a MI regularizer that suppresses low-frequency redundancy and accentuates clinically salient high-frequency information. Without altering architecture, RetMAE learns frequency-balanced features that surpass those of MAE-based retinal encoders in both quantitative and qualitative evaluations. These results suggest that a frequency-oriented view provides a principled foundation for future advances in ophthalmic modeling. offering new insight into how MAE’s reconstruction objective amplifies ViT’s low-pass tendencies in spatially heterogeneous retinal images and enabling a simple MI-based correction that improves retinal encoders.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 5485

Loading