Abstract: Breast cancer stands as the foremost cause of cancer-related deaths among women worldwide. The prompt and accurate detection of breast lesions through ultrasound videos plays a crucial role in early diagnosis. However, existing ultrasound video lesion detectors often rely on multiple adjacent frames or non-local temporal fusion strategies to enhance performance, consequently compromising their detection speed. This study presents a simple yet effective network called the Space Time Feature Aggregation Network (STA-Net). Its main purpose is to efficiently identify lesions in ultrasound videos. By leveraging a temporally shift-based space-time aggregation module, STA-Net achieves impressive real-time processing speeds of 54 frames per second on a single GeForce RTX 3090 GPU. Furthermore, it maintains a remarkable accuracy level of 38.7 mean average precision (mAP). Through extensive experimentation on the BUV dataset, our network surpasses existing state-of-the-art methods both quantitatively and qualitatively. These promising results solidify the effectiveness and superiority of our proposed STA-Net in ultrasound video lesion detection.
Loading