Graph-in-graph discriminative feature enhancement network for fine-grained visual classification

Yupeng Wang, Can Xu, Yongli Wang, Xiaoli Wang, Weiping Ding

Published: 01 Jan 2025, Last Modified: 15 May 2025Appl. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Fine-grained visual classification (FGVC) seeks to identify sub-classes within the same meta-class. Prior efforts mainly mine the features of discriminative parts to enhance classification performance. However, we argue that most of these works ignore the spatial details inside each part and the spatial correlations between parts when extracting local features and fusing global features, inhibiting the further improvement of feature quality, especially for the irregular discriminative parts. To alleviate this issue, we rethink the feature generation route from pixels to parts and to objects, and propose a novel graph-in-graph discriminative feature enhancement network (G\(^{2}\)DFE-Net). Specifically, the G\(^{2}\)DFE-Net consists of two nested graph convolutional networks, where an internal graph is first developed based on the spatial attention strategy to highlight details of the irregular discriminative regions. Then, a KNN-based external graph is introduced to capture the spatial context correlation among independent discriminative parts. With the collaboration of internal and external graph, G\(^{2}\)DFE-Net boosts the class separability and compactness of global feature representation, thereby benefiting the accurate FGVC. We conduct thorough experiments on five benchmark datasets, and both quantitative and qualitative results confirm the superior accuracy of our G\(^{2}\)DFE-Net compared to previous state-of-the-art algorithms. The code is available at https://github.com/WangYuPeng1/G2DFE-Net.