Gaussian-Prior Pinwheel Convolution and Region-Energy Loss for Robust Infrared Small Target Detection

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gaussian-Prior Pinwheel Convolution (GPConv), Infrared Small Target Detection, Region Energy-Based Loss (IR-SOIoU Loss), Neuron-Level 3D Attention
TL;DR: We propose GPConv, a Gaussian-prior convolution with region energy-based loss and neuron-level 3D attention, achieving state-of-the-art infrared small target detection on benchmark datasets
Abstract: In recent years, convolutional neural network (CNN)-based approaches have achieved notable progress in infrared small target detection. However, most existing methods rely on standard convolution operations, which fail to capture the unique spatial distribution characteristics of infrared small targets. To overcome this limitation, we propose Gaussian-Prior Pinwheel Convolution (GPConv), a novel module that replaces standard convolutions in the lower layers of the backbone to better model the Gaussian-like spatial distribution of dim targets while enlarging the receptive field with only marginal parameter overhead. Furthermore, conventional loss functions that combine scale and localization terms often overlook the varying sensitivity across different target sizes. To address this issue, we design a Region Energy-Based Loss that incorporates a dynamic small object-aware weighting factor r(A) and a center distance penalty to enhance robustness across scales. In addition, we introduce a neuron-level 3D attention mechanism that jointly considers channel, spatial, and depth dimensions to refine feature representations more effectively than channel-only or spatial-only modules. Extensive experiments on the IRSTD-1K and SIRST-UAVB datasets demonstrate that integrating GPConv, Region Energy-Based Loss, and 3D attention into modern detection frameworks (YOLOv8n and RetinaNet) yields consistent and significant improvements, validating the effectiveness and generalization of the proposed approach.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8355
Loading