Abstract: Fine-grained visual categorization (FGVC) presents a notable challenge owing to the high intra-class variance and minimal inter-class variability. In multi-stage FGVC tasks, the initial attention region’s influence on subsequent stages proves profoundly significant. Prior approaches tended to excessively concentrate on discriminatory regions, which overlooked the crucial aspect of effectively focusing on objects in the initial phase. In this paper, we propose the Attention Binary Navigation Tree (ABNT) model to augment the discernment capabilities of the initial leaf node when distinguishing between object and background information. This strategy makes the model direct the attention to the object and can provide more effective guidance for subsequent stages. Moreover, multiple branch routing modules are integrated via decision trees to rationally distribute the contribution of each leaf node. Subsequently, predictions from the leaf nodes are aggregated to obtain the final decision. Extensive experiments on two fine-grained benchmark datasets are conducted to validate the effectiveness of the proposed model. Experimental results demonstrate the marked superiority of the proposed ABNT model over other state-of-the-art FGVC methods.
Loading