Multimodal Discriminative Feature Learning for SAR ATR: A Fusion Framework of Phase History, Scattering Topology, and Image

Zaidao Wen, Youlan Yu, Qian Wu

Published: 2024, Last Modified: 21 Mar 2026IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning has emerged as the dominant paradigm for synthetic aperture radar (SAR) automatic target recognition (ATR), which induces discriminative visual features from target images. Due to the imaging mechanism, the SAR image is a reconstruction of the interaction between the electromagnetic wave and the target, which intrinsically entangles the radar characteristics, target scattering properties, and visual signatures. However, current learning algorithms focus primarily on the visual signatures without explicit specification or ineffective integration of the full range of SAR features into the learning process, leading to a weak generalization ability across different radar systems and imaging conditions. To address this issue, we propose a novel multimodal feature fusion learning framework, which captures a comprehensive set of target features from different domains for enhanced complementarity. First, we encode the target response and radar characteristics into the phase-history data. A cross-direction sequence learning module is designed to extract their range and azimuth dependence. Next, a hierarchical graph node-aggregating neural network is developed to learn the scattering topology features from scattering points according to the part-to-whole learning bias. Finally, the learned features from the phase-history and scattering domains are fused with the image features obtained from an off-the-shelf deep feature extractor for final target recognition. Experiments on the moving and stationary target acquisition and recognition (MSTAR) benchmark demonstrate its effectiveness. Compared with the other SAR ATR algorithms, our approach can achieve the state-of-the-art recognition accuracy without using any data augmentation trick, especially in cases of limited training samples.
Loading