Abstract: Lesion segmentation in breast ultrasound videos plays a crucial role in the early detection and intervention of breast cancer. However, it remains a challenging task due to blurred lesion boundaries, substantial background noise, and significant scale variations of lesions across frames. Existing methods typically rely on selecting preceding frames for rudimentary temporal integration but fail to achieve satisfactory segmentation performance. In this paper, we propose STMFSAM, a novel Spatio-Temporal Memory Filtering SAM network, designed to leverage the powerful feature representation and modeling capabilities of SAM for lesion segmentation in breast ultrasound videos. Specifically, we introduce a memory mechanism that stores and propagates essential spatio-temporal features across frames. To enhance segmentation accuracy, we select three relevant reference frames from the memory bank as dense prompts for SAM, enabling it to retain long-term contextual information and effectively guide the segmentation of subsequent frames. To further mitigate the impact of background noise, we present the Spatio-Temporal Memory Filtering module, which selectively refines the memory content by filtering out irrelevant or noisy information. This ensures that only meaningful and informative features are retained for segmentation. We conduct extensive experiments on the UVBSL200 breast ultrasound video dataset, demonstrating that STMFSAM outperforms existing methods. Additionally, to highlight our model’s generalization capability, we achieve competitive results on two video polyp segmentation datasets. The code is available at https://github.com/tzz-ahu/STMFSAM.
External IDs:dblp:conf/miccai/TuZJWWZ25
Loading