A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

Published: 2025, Last Modified: 01 Apr 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Ultrasound video-based breast lesion segmentation provides valuable assistance in early breast lesion detection and discrimination. However, this field faces two key challenges: the first is how to simultaneously utilize both intra-frame and inter-frame lesion cues to accurately segment breast lesions, and the second is that the availability of breast ultrasound video datasets is quite limited. In this paper, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video-based breast lesion segmentation problem. The proposed STPFNet comprises three main components. First, we propose to adopt a unified network architecture to capture spatial dependencies within each ultrasound frame and temporal correlations between different frames together for feature representation of ultrasound video. Second, we propose a new fusion module called Multi-Granularity Feature Fusion (MGFF) to fuse the extracted information with different granularities for lesion segmentation. MGFF can help improve the issue of lesion boundary blurring. Third, we propose to take the segmentation result of the previous frame as prior knowledge to suppress the noisy background and learn a more robust representation. To further promote the research in this field, we construct a new ultrasound video breast lesion segmentation dataset, called UVBLS200, comprising 200 videos (80 benign and 120 malignant lesions). Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves a better breast lesion detection performance than state-of-the-art methods.

External IDs:dblp:journals/tmm/TuZDJWZ25