Spatial-temporal-channel collaborative feature learning with transformers for infrared small target detection

Published: 01 Jan 2025, Last Modified: 20 Jul 2025Image Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Three transformer encoders are designed to capture features from three domains.•Propose a feature fusion method base on transformer that akin to human vision.•Propose a semantic-aware positional encoding method for videos.•Achieve state-of-the-art results on two public infrared datasets.
Loading