Spatial-temporal-channel collaborative feature learning with transformers for infrared small target detection
Abstract: Highlights•Three transformer encoders are designed to capture features from three domains.•Propose a feature fusion method base on transformer that akin to human vision.•Propose a semantic-aware positional encoding method for videos.•Achieve state-of-the-art results on two public infrared datasets.
Loading