Abstract: Fine-grained Visual Recognition (FGVR) aims to distinguish objects within similar subcategories. Humans adeptly perform this challenging task by leveraging both intra-category distinctiveness and inter-category similarity. However, previous methods fail to combine these two complementary dimensions and mine the intrinsic relations among various semantic features. To address these limitations, we propose HI2R, a Hypergraph-guided Intra- and Inter-category Relation Modeling approach, which simultaneously extracts the intra-category structural information and inter-category relation for more precise reasoning. Specifically, we exploit a Hypergraph-guided Structure Learning (HSL) module, which employs hypergraphs to capture high-order structural relations, transcending traditional graph-based methods that are limited to pairwise linkages. This advancement allows the model to adapt to significant intra-category variations. Additionally, we propose an Inter-category Relation Perception (IRP) module to improve feature discrimination across categories by extracting and analyzing semantic relations among them. Our objective is to alleviate the robustness issue associated with exclusive reliance on intra-category discriminative features. Furthermore, a random semantic consistency (RSC) loss is introduced to direct the model's attention to commonly overlooked yet distinctive regions, indirectly enhancing the representation ability of both HSL and IRP modules. Both qualitative and quantitative results demonstrate the effectiveness and usefulness of HI2R.
Loading