Hemisphere-based Local Feature Fusion from Multimodal Imaging for Interpretable AD Diagnosis

Zibo Zhao, Yanteng Zhang, YUXIANG WEI, Chuanyi Zhang

Published: 05 May 2026, Last Modified: 24 Apr 2026IEEE International Conference on Multimedia and Expo 2026EveryoneRevisionsCC BY 4.0

Abstract: Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET), as the most commonly used imaging modalities for clinical diagnosis of Alzheimer’s disease (AD), provide structural and functional information of the brain, respectively. However, multimodal methods still face challenges in AD prediction due to pronounced inter-individual heterogeneity and subtle pathological changes in the imaging manifestations. To address this issue, this work proposed an end-to-end deep learning framework based on the neuroanatomical characteristics of bilateral brain symmetry and the asymmetric distribution of AD pathology. First, brain images are divided into 3D regional patches according to the left and right hemispheres, and the features are encoded via patch CNNs. Subsequently, multi-head attentions are employed to optimize the representation of these local features among brain patch regions in each of the hemispheres. Finally, we developed a Hemisphere-aware Cross Transformer that performs hierarchical feature fusion at both intermodal and interhemispheric levels. Compared to several deep learning models, our proposed network achieved significant improvements in both AD diagnosis and early AD prediction on the ADNI dataset. More importantly, our approach achieves a breakthrough in interpretability, providing critical insights for the exploration of AD patterns in multimodal brain imaging.