Abstract: The fusion of multimodal medical data is crucial for helping doctors make accurate treatment decisions. For example, combining Computed Tomography Pulmonary Angiography (CTPA) with Electronic Health Records (EHR) can significantly improve the accuracy of Pulmonary Embolism (PE) detection, thereby increasing patient survival rates. Although multimodal learning has advantages in PE diagnosis, the heterogeneity of multimodal data poses a significant challenge to accurate diagnosis. The natural semantic and structural differences between data modalities make it difficult to effectively integrate their information. In addition, within a single modality, the existence of redundant and irrelevant information introduces unnecessary variability, making the data more complex, and making stable diagnosis challenging. To address these issues, we propose a new framework called CCLNet, which includes a contrastive learning component for addressing inter-modality heterogeneity and a causal learning component for handling intra-modality heterogeneity. Specifically, we achieve precise alignment between visual and tabular modalities by using global-level information to soften labels during contrastive learning. In addition, by using causal intervention methods to eliminate the influence of heterogeneous factors within the modality, we can accurately reveal the causal relationship between features and targets, thereby improving the accuracy and stability of the model. Experimental results demonstrate that our method performs excellently, achieving the best results. Our code is available at https://github.com/LeavingStarW/CLPE.
Loading