Robust Fine-Grained Visual Categorization via Cyclical Attention

Bin Kang, Dong Liang, Daoyuan Chen, Tianyu Ding, Mingqiang Wei

Published: 01 Jan 2025, Last Modified: 11 Nov 2025IEEE Transactions on Neural Networks and Learning SystemsEveryoneRevisionsCC BY-SA 4.0

Abstract: Fine-grained visual categorization (FGVC) in open-world settings frequently encounters heavy occlusion (HO) samples that compromise discriminative features. However, effectively addressing heavy occlusion remains a challenge. Existing methods often either discard the occluded parts or utilize them through additional techniques such as image inpainting or multimodel strategies, each with its own set of advantages and limitations. In this article, we propose a novel approach inspired by human self-regulated learning (SRL) behavior: cyclical attention that leverages occluded regions through the attention recalibration in the feedback loop. In particular, we introduce a new multi-instance model where occluded parts are essential due to a special feedback structure at the basis of a cooperative game mechanism. This mimics SRL to re-evaluate the previous attention-based image patch selection strategy. We then embed the proposed multi-instance model into a transformer architecture, creating an SRL-FGVC transformer. The key innovation of this design is the cyclical attention, with the forward and feedback self-attention formulating a cooperative union to mitigate attention bias. Extensive experiments on six public datasets and an additional dataset we established demonstrate that the SRL-FGVC transformer consistently outperforms existing approaches in HO scenarios. This work presents a promising new direction for robust FGVC in challenging real-world conditions.

External IDs:doi:10.1109/tnnls.2025.3608560