Bidirectional Hierarchical Reasoning for Fine-grained Visual Recognition

Shuaike Zhao; Zhongling Huang; Jingzhe Zhou; Tuo Zhang; Gong Cheng; Junwei Han

Bidirectional Hierarchical Reasoning for Fine-grained Visual Recognition

Shuaike Zhao, Zhongling Huang, Jingzhe Zhou, Tuo Zhang, Gong Cheng, Junwei Han

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI，Hierarchical reasoning，Fine-grained recognition

Abstract: Fine-grained visual recognition (FGVR) requires not only high accuracy but also human-aligned interpretability, particularly in safety-critical applications. While human cognition naturally follows a coarse-to-fine reasoning process—rapid holistic categorization for coarse-grained class followed by attention to local details for fine-grained class—existing post-hoc and ante-hoc interpretability methods fall short in capturing this hierarchy automatically. To address this gap, we propose Bi-HiR, a novel Bidirectional Hierarchical Reasoning framework that emulates human-like cognition by integrating top-down semantic reasoning with bottom-up prototype-based explanations. Specifically, Bi-HiR: (1) leverages large language model (LLM)-derived semantic priors to construct coarse-to-fine hierarchies without manual annotations; (2) introduces a joint optimization strategy where top-down priors guide bottom-up prototype learning across semantic levels; and (3) produces interpretable, step-wise visual and semantic explanations. Experiments on six FGVR benchmarks demonstrate that Bi-HiR achieves competitive SOTA performance and exhibits superior zero-shot generalization. The results also reveal the superiority of Bi-HiR’s interpretability on human trust and model error diagnose. Code is publicly available at https://github.com/duduududamax/Bi-HiR.

Primary Area: interpretability and explainable AI

Submission Number: 11535

Loading