Abstract: Medical images generated by different imaging instruments often have different modalities. An architecture that can bridge the instrument gap and provide unified training for different modalities of images is the development direction of medical image segmentation. However, existing methods seldom pay attention to the mutual interference caused by the distribution differences of features between different modalities, which is a key factor in the training of mixed-modality datasets. Differences in diseases and imaging methods result in variations in the distribution of lesion size and frequency information. When training with data from different modalities mixed together, these differences typically lead to a decrease in network performance. To address this, we propose a mixed-modality segmentation network (MMSNet), which consists of three key components: multiscale frequency guidance (MSFG), modality feature adaptor (MFA), and frequency enhancement prompt (FEP). The MSFG refines the process of spatial feature extraction by incorporating multiscale frequency features. Moreover, MFA is designed to adjust pre-trained patch embeddings and transformer layers to reduce the cost of acquiring unknown modality features. Finally, FEP captures multiresolution frequency features of the image and fuses them with the frequency feature map of the image, which can enhance the network’s ability to extract frequency information across multiple spatial domains to a certain extent. In addition, we also introduced Mix1 and Mix2, which are composed of medical images from four different modalities to test the segmentation performance of MMSNet. Our experiment demonstrates that MMSNet can effectively alleviate the interference caused by differences in image frequency distribution, ultimately improving the segmentation quality of medical images. Our code will be made public at https://github.com/linzijin1238/MMSNet.
Loading