Abstract: Few-shot fine-grained image classification aims to use only few labelled samples to successfully recognize subtle sub-classes within the same parent class. This task is extremely challenging, due to the co-occurrence of large inter-class similarity, low intra-class similarity, and only few labelled samples. In this paper, to address these challenges, we propose a new Channel-Spatial Cross-Attention Module (CSCAM), which can effectively drive a model to extract discriminative fine-grained feature representations with only few shots. CSCAM collaboratively integrates a channel cross-attention module and a spatial cross-attention module, for the attentions across support and query samples. In addition, to fit for the characteristics of fine-grained images, a support averaging method is proposed in CSCAM to reduce the intra-class distance and increase the inter-class distance. Extensive experiments on four few-shot fine-grained classification datasets validate the effectiveness of CSCAM. Furthermore, CSCAM is a plug-and-play module, conveniently enabling effective improvement of state-of-the-art methods for few-shot fine-grained image classification.
Loading