Abstract: Feature selection and representation in infrared small target detection (ISTD) and spatial localization are crucial for detection accuracy. However, existing methods are not highly accurate in small target detection against complex backgrounds. In this article, we propose a patch spatial attention network, termed the semantic token transformer network (STPSA-Net), to detect small targets from a novel perspective. This framework represents images as compact semantic tokens by a semantic token transformer (STT) module and models spatiotemporal context to refine the original features and enhance the representation capability of small target features. The PSAM divides extracted features into patches and integrates spatial and semantic information to restore spatial information and achieve precise localization. Extensive experiments on the SIRST, MFIRST, and NUDT-SIRST datasets show the proposed method’s accurate detection of infrared small targets and its superior performance compared with state-of-the-art approaches.
External IDs:dblp:journals/tgrs/LiuQLWD25
Loading