Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer's U-shaped network
Abstract: Ultrasound image segmentation remains challenging due to blurred boundaries and morphological heterogeneity, while existing deep learning methods heavily rely on costly expert annotations. To address these issues, this study proposes a semi-supervised learning algorithm called Double U-Net (W-Net), built on consistency regularization and a cross-teaching framework. Specifically, we introduce a Deeper Dual-output Fusion U-Net (DDFU-Net) designed to tackle ultrasound-specific challenges. This architecture enhances multi-scale feature extraction by improving the backbone network, integrating a dual-output refinement (DOR) module and incorporating a spatial feature calibration (SFC) module to optimize multi-scale feature fusion. Furthermore, the proposed network combines DDFU-Net with a lightweight Transformer, enabling CNNs and Transformers to complement each other in local and global feature extraction. Through mutual end-to-end supervision, the method effectively leverages unlabeled data. Our method achieves competitive performance: (1) Compared to other semi-supervised methods, it outperforms the second-best by 7.96% (BUSI, 20% labels) and 17.52% (10% labels), with 5.47% (GCUI, 20%) and 6.08% (GCUI, 10%) improvements; and (2) compared to fully supervised U-Net, it elevates Dice by 6.09%/3.86% (BUSI) and 3.89%/4.42% (GCUI) under 10%/20% labels condition, proving the ability to effectively leverage unlabeled data, extracting rich feature information to enhance model interpretability of complex medical images, particularly in low-data scenarios.
External IDs:dblp:journals/tjs/ZhouLGCGLZLLS25
Loading