Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation

Published: 27 Mar 2025, Last Modified: 01 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Breast lesion, Ultrasound video, Real- time detection, Transformer
Abstract: Recently, transformer-based detectors have shown impressive performance for breast lesion detection in ultrasound videos. However, these methods often require substantial computational resource and ex- hibit low inference speed, which poses challenges towards real-time ap- plicability. To address this issue, we introduce a fast yet accurate spatial- temporal transformer, named FA-DETR, to efficiently aggregate multi- scale spatial-temporal features for breast lesion detection in ultrasound videos. Our FA-DETR is based on a lightweight spatial-temporal self- attention module, which seamlessly fuses spatial and temporal features extracted from each video frame. In the decoding phase, we employ IoU- aware query selection to generate independent queries for each frame. These queries gain access to rich spatial-temporal information through the encoder embeddings’ cross-attention and frame-aware cross-attention mechanisms. Experiments conducted on a public breast lesion ultrasound video dataset demonstrate that our FA-DETR achieves state-of-the-art performance with an absolute gain of 3.8% in terms of overall AP while being 2.5 times faster, compared to the best existing approach in the literature. Our code and models will be publicly released.
Primary Subject Area: Detection and Diagnosis
Secondary Subject Area: Application: Radiology
Paper Type: Both
Registration Requirement: Yes
Submission Number: 60
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview