Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer's U-shaped network

Huabiao Zhou, Yanmin Luo, Jingjing Guo, Zhikui Chen, Wanyuan Gong, Zhongwei Lin, Minling Zhuo, Youjia Lin, Weiwei Lin, Qingling Shen

Published: 2025, Last Modified: 25 Jan 2026J. Supercomput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Ultrasound image segmentation remains challenging due to blurred boundaries and morphological heterogeneity, while existing deep learning methods heavily rely on costly expert annotations. To address these issues, this study proposes a semi-supervised learning algorithm called Double U-Net (W-Net), built on consistency regularization and a cross-teaching framework. Specifically, we introduce a Deeper Dual-output Fusion U-Net (DDFU-Net) designed to tackle ultrasound-specific challenges. This architecture enhances multi-scale feature extraction by improving the backbone network, integrating a dual-output refinement (DOR) module and incorporating a spatial feature calibration (SFC) module to optimize multi-scale feature fusion. Furthermore, the proposed network combines DDFU-Net with a lightweight Transformer, enabling CNNs and Transformers to complement each other in local and global feature extraction. Through mutual end-to-end supervision, the method effectively leverages unlabeled data. Our method achieves competitive performance: (1) Compared to other semi-supervised methods, it outperforms the second-best by 7.96% (BUSI, 20% labels) and 17.52% (10% labels), with 5.47% (GCUI, 20%) and 6.08% (GCUI, 10%) improvements; and (2) compared to fully supervised U-Net, it elevates Dice by 6.09%/3.86% (BUSI) and 3.89%/4.42% (GCUI) under 10%/20% labels condition, proving the ability to effectively leverage unlabeled data, extracting rich feature information to enhance model interpretability of complex medical images, particularly in low-data scenarios.

External IDs:dblp:journals/tjs/ZhouLGCGLZLLS25