Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases
Abstract: With the advancement of computer vision, numerous models have been proposed for screening of fundus diseases. However, the recognition of multiple fundus diseases is often hampered by the simultaneous presence of multiple disease types and the confluence of lesion types in fundus images. This paper addresses these challenges by conceptualizing them as multi-level feature fusion and self-supervised disease-indicative feature learning problems. We decode fundus images at various levels of granularity to delineate scenarios wherein multiple diseases and lesions co-occur. To effectively integrate these features, we introduce a hierarchical vision transformer (HVT) that adeptly captures both inter-level and intra-level dependencies. A novel forward-attention module is proposed to enhance the integration of lower-level semantic information into higher semantic layers, thereby enriching the representation of complex features. Additionally, we introduce a novel self-supervised mask-consistent feature learner (MCFL). Unlike traditional mask-autoencoders that reconstruct original images using encoder-decoder structures, MCFL utilizes a teacher-student framework to reconstruct mask-consistent feature maps. In this setup, exponential moving averaging is employed to derive classification-guided features, serving as labels for reconstruction rather than merely reconstructing the original images. This innovative approach facilitates the extraction of disease-indicative features. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art models.
Loading