Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification

Shicheng Yang; Xiaoxu Li; Dongliang Chang; Zhanyu Ma; Jing-Hao Xue

Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification

Shicheng Yang, Xiaoxu Li, Dongliang Chang, Zhanyu Ma, Jing-Hao Xue

Published: 01 Jan 2024, Last Modified: 16 May 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Few-shot fine-grained image classification aims to use only few labelled samples to successfully recognize subtle sub-classes within the same parent class. This task is extremely challenging, due to the co-occurrence of large inter-class similarity, low intra-class similarity, and only few labelled samples. In this paper, to address these challenges, we propose a new Channel-Spatial Cross-Attention Module (CSCAM), which can effectively drive a model to extract discriminative fine-grained feature representations with only few shots. CSCAM collaboratively integrates a channel cross-attention module and a spatial cross-attention module, for the attentions across support and query samples. In addition, to fit for the characteristics of fine-grained images, a support averaging method is proposed in CSCAM to reduce the intra-class distance and increase the inter-class distance. Extensive experiments on four few-shot fine-grained classification datasets validate the effectiveness of CSCAM. Furthermore, CSCAM is a plug-and-play module, conveniently enabling effective improvement of state-of-the-art methods for few-shot fine-grained image classification.

Loading