Exploring Text-Enhanced Mixture-of-Experts for Semi-supervised Medical Image Segmentation with Composite Data

Published: 01 Jan 2025, Last Modified: 16 Oct 2025MICCAI (6) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Semi-supervised learning (SSL) has emerged as an effective approach to reduce reliance on expensive labeled data by leveraging large amounts of unlabeled data. However, existing SSL methods predominantly focus on visual data in isolation. Although text-enhanced SSL approaches integrate supplementary textual information, they still treat image-text pairs independently. In this paper, we explore the potential of jointly learning from related text-image datasets to further advance the capabilities of SSL. To this end, we introduce a novel text-enhanced Mixture-of-Experts (MoE) model, augmented with textual information, for semi-supervised medical image segmentation (TextMoE). TextMoE incorporates a universal vision encoder and a text-assisted MoE (TMoE) decoder, enabling it to simultaneously process CT-text and X-Ray-text data within a unified framework. To achieve effective knowledge integration from heterogeneous unlabeled data, a content regularization with frequency space exchange is designed, guiding TextMoE to learn modality-invariant representations. Additionally, the proposed TMoE decoder is enhanced by modality indicators, securing the effective fusion of visual and textual features. Finally, a differential loss is introduced to diversify the semantic understanding between visual experts, ensuring complementary insights to the overall interpretation. Experiments conducted on two public datasets indicate that TextMoE outperforms SSL and text-assisted SSL methods, achieving superior performance. Code is available at: https://github.com/jgfiuuuu/TextMoE.
Loading