Two-stage fine-grained image classification model based on multi-granularity feature fusion

Yang Xu, Shanshan Wu, Biqi Wang, Ming Yang, Zebin Wu, Yazhou Yao, Zhihui Wei

Published: 2024, Last Modified: 07 Nov 2024Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A multi-granularity feature fusion module is proposed to solve the limitations of single-scale features.•A two-stage classification based on Vision-Transformer is proposed to reduce background interference on predictions. By leveraging the ViT model, the object can be separated from the background and the details can be enlarged.•Extensive experiments prove the superiority of our model. The visualization results illustrate that our two-stage classification can accurately localize objects and facilitate correct predictions.