Abstract: Few-shot image classification is a challenging task that aims to recognize image classes based on only a few training images. However, existing methods face the following two main challenges: (1) Ignoring the frequency domain information during image feature extraction. (2) It does not take the semantic gap between multiple modalities into consideration, which limits the classification performance. To overcome these limitations, we propose a novel method named Spatial-Frequency Integration Network with Dual Prompt Learning for few-shot image classification. Firstly, we introduce a spatial-frequency integration module that combines spatial domain and low-frequency information to extract discriminative image features from the image modality. Secondly, we design a dual prompting module, which integrates learnable prompts and hand-crafted prompts to improve the generalization of applications to new classes. Thirdly, we propose an image-text interaction module to enhance inter-modal complementary and consistency. Both theoretical and experimental validations confirm the effectiveness of the proposed method in few-shot image classification.
Loading