A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis

Published: 2024, Last Modified: 19 May 2025MICCAI (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Esophageal fistula (EF) is a critical and life-threatening complication following radiotherapy treatment for esophageal cancer (EC). Albeit tabular clinical data contains other clinically valuable information, it is inherently different from CT images and the heterogeneity among them may impede the effective fusion of multi-modal data and thus degrade the performance of deep learning methods. However, current methodologies do not explicitly address this limitation. To tackle this gap, we present an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF detection before radiotherapy. Our MDC model comprises a clinical data encoder, an adaptive 3D Trans-CNN image encoder, and a dual-layer cross-attention (DualCrossAtt) module. The Image Encoder utilizes both CNN and transformer to extract multi-level local and global features, followed by global depth-wise convolution to remove the redundancy from these features for robust adaptive fusion. To mitigate the heterogeneity among multi-modal features and enhance fusion effectiveness, our DualCrossAtt applies the first layer of a cross-attention mechanism to perform alignment between the features of clinical data and images, generating commonly attended features to the second-layer cross-attention that models the global relationship among multi-modal features for prediction. Furthermore, we introduce a contrastive learning-enhanced hybrid loss function to further boost performance. Comparative evaluations against eight state-of-the-art multi-modality predictive models demonstrate the superiority of our method in EF prediction, with potential to assist personalized stratification and precision EC treatment planning.
Loading