SEMI-CONTRANS: Semi-Supervised Medical Image Segmentation via Multi-Scale Feature Fusion and Cross Teaching of CNN and Transformer

Weiren zhao, Lanfeng Zhong, Guotai Wang

Published: 22 Aug 2024, Last Modified: 30 Sept 2024IEEE International Symposium on Biomedical Imaging (ISBI 2024)EveryoneCC BY 4.0

Abstract: Convolutional Neural Networks (CNNs) and Transformers have achieved promising results in fully supervised medical image segmentation. However, acquiring high-quality annotations for medical images is prohibitively expensive, making semi-supervised learning a promising way to reduce the annotation cost by leveraging both labeled and unlabeled images for training. In this work, we propose a novel model named Semi-ConTrans that unifies the advantages of CNNs and Transformers through multi-scale feature fusion and cross teaching for semi-supervised segmentation. Specifically, to leverage localization capability from CNNs and global context modeling of self-attention in Transformers in a unified framework, we adaptively fuse them at multiple scales in the encoder. Furthermore, we use a CNN decoder and a Transformer decoder with different decision boundaries for cross teaching, obtaining more holistic pseudo labels for dealing with unlabeled images. Experiments on the ACDC dataset of cardiac images demonstrate that our approach greatly improves the performance with only 10% or 20% labeled images by exploiting unlabeled images, outperforming eight state-of-the-art semi-supervised segmentation methods.