Hybrid cross-modality fusion network for medical image segmentation with contrastive learning

Xichuan Zhou, Qianqian Song, Jing Nie, Yujie Feng, Haijun Liu, Fu Liang, Lihui Chen, Jin Xie

Published: 01 Jan 2025, Last Modified: 08 Apr 2025Eng. Appl. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Medical image segmentation has been widely adopted in artificial intelligence-based clinical applications. The integration of medical texts into image segmentation models has significantly improved the segmentation performance. It is crucial to design an effective fusion manner to integrate the paired image and text features. Existing multi-modal medical image segmentation methods fuse the paired image and text features through a non-local attention mechanism, which lacks local interaction. Besides, they lack a mechanism to enhance the relevance of the paired features and keep the discriminability of unpaired features in the training process, which limits the segmentation performance. To solve the above problem, we propose a hybrid cross-modality fusion network (HCFNet) based on contrastive learning for medical image segmentation. The key designs of our proposed method are a multi-stage cross-modality contrastive loss and a hybrid cross-modality feature decoder. The multi-stage cross-modality contrastive loss is utilized to enhance the discriminability of the paired features and separate the unpaired features. Furthermore, the hybrid cross-modality feature decoder conducts local and non-local cross-modality feature interaction by a local cross-modality fusion module and a non-local cross-modality fusion module, respectively. Experimental results show that our method achieved state-of-the-art results on two public medical image segmentation datasets.