Keywords: Breast lesion, Ultrasound video, Real- time detection, Transformer
Abstract: Recently, transformer-based detectors have shown impressive
performance for breast lesion detection in ultrasound videos. However,
these methods often require substantial computational resource and ex-
hibit low inference speed, which poses challenges towards real-time ap-
plicability. To address this issue, we introduce a fast yet accurate spatial-
temporal transformer, named FA-DETR, to efficiently aggregate multi-
scale spatial-temporal features for breast lesion detection in ultrasound
videos. Our FA-DETR is based on a lightweight spatial-temporal self-
attention module, which seamlessly fuses spatial and temporal features
extracted from each video frame. In the decoding phase, we employ IoU-
aware query selection to generate independent queries for each frame.
These queries gain access to rich spatial-temporal information through
the encoder embeddings’ cross-attention and frame-aware cross-attention
mechanisms. Experiments conducted on a public breast lesion ultrasound
video dataset demonstrate that our FA-DETR achieves state-of-the-art
performance with an absolute gain of 3.8% in terms of overall AP while
being 2.5 times faster, compared to the best existing approach in the
literature. Our code and models will be publicly released.
Primary Subject Area: Detection and Diagnosis
Secondary Subject Area: Application: Radiology
Paper Type: Both
Registration Requirement: Yes
Submission Number: 60
Loading