Multimodal Feature Interactive Learning for Few-Shot Hyperspectral Image Classification

Fang Liu, Yan Li, Wenfei Gao, Jia Liu, Xu Tang, Liang Xiao

Published: 01 Jan 2025, Last Modified: 15 Nov 2025IEEE Transactions on Geoscience and Remote SensingEveryoneRevisionsCC BY-SA 4.0

Abstract: Recently, auxiliary cross-scene information has been widely utilized to improve the hypersperctral image classification performance by knowledge transfer. However, recognition of different objects with the same semantic category is difficult when the object types in similar scenes are different or only limited similarity knowledge is provided. In this article, a multimodal feature interactive (MMFI) learning method is proposed based on both hyperspectral image (HSI) modality and textual modality to distinguish similar objects, which enhances the transfer capability by utilizing the semantic prior from the textual modality. First, the adversarial domain mapping (ADM) module is designed to realize cross-domain knowledge transfer across different scenes in an adversarial learning manner. In particular, the noise is simulated as data distribution in different domains through domain mapping and aggregated with source and target domain data, which is then reconstructed and optimized to learn discriminative and conducive information for transfer. Then, the adaptive interaction learning (AIL) module acts on the latent features of the encoder to mine latent associations among the aggregated features and facilitate the expression of consistent features. In addition, few-shot learning (FSL) with textual embedding enables more powerful semantic priors for few-shot prototypes, making up for insufficient recognition capability in the presence of HSI modality only. Experimental results on three datasets demonstrate the superiority of our method.

External IDs:doi:10.1109/tgrs.2025.3624605