Diff-mednet: differential convolution and median-enhanced attention multiscale fusion for infrared small target detection
Abstract: Infrared small target detection (ISTD) is critical for applications such as military reconnaissance. Owing to the tiny size and lack of prominent textures, target identification relies primarily on geometric features. Existing unidirectional differential convolution methods capture gradients in a single direction, limiting their ability to fully characterize the target geometry. Moreover, low-level features, which are crucial for geometric feature extraction, are easily corrupted by sensor noise, leading to unstable representations. To address these challenges, this study proposes Diff-MedNet, a U-Net framework-based method. Diff-MedNet introduces a detail-enhanced convolution module (DEConv) that combines multidirectional differential convolutions with vanilla convolution. It simultaneously captures multidirectional gradient features and global background information, effectively enhancing the representation of geometric features in images. To suppress noise interference and enhance feature expression, we designed two attention mechanisms. For low-level features, the median-enhanced channel-space attention mechanism (MECS) based on median pooling effectively reduces noise interference by computing local median values. For deep-level features, we propose a multiscale depthwise separable convolutional fusion attention mechanism (MDCF). Leveraging multiscale branches and attention-weighted processing, MDCF effectively captures both fine-grained details and contextual background information, achieving a balance between detail sensitivity and background dependence to enhance feature representation. Additionally, to minimize the loss of small target details during downsampling, we introduce a feature-adaptive fusion module (FAF) that selects the best features for fusion based on the target size and feature adaptation. Experimental results indicate that Diff-MedNet achieves outstanding performance on the SIRST, IRSTD, and NUDT-SIRST public datasets, with high detection probabilities reaching 99.21%, 93.44%, and 99.30%, respectively.
External IDs:dblp:journals/mms/LiuQS26
Loading