A benchmark dataset and approach for fine-grained visual categorization in complex scenes

Xiang Zhang, Keran Zhang, Wanqing Zhao, Hangzai Luo, Sheng Zhong, Lei Tang, Jinye Peng, Jianping Fan

Published: 2023, Last Modified: 21 Jan 2026Digit. Signal Process. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the vast development of deep learning, many deep learning-based approaches have demonstrated their outstanding performance on the task of fine-grained visual categorization (FGVC). However, existing fine-grained datasets mainly focus on simple images (i.e., objects tend to occupy a significantly larger portion of the image and appear in a relatively clear background). This seriously restricts the application of FGVC in real-world scenarios. In this paper, we construct a fine-grained dataset named AIBD-Cars, which contains 28,471<math><mn is="true">28</mn><mo is="true">,</mo><mn is="true">471</mn></math> car images with complex backgrounds belonging to 196 fine-grained classes. Furthermore, we propose a Location-Aware Channel-Spatial Attention Network (LCSANet), which considers both locating object regions and mining discriminative information to achieve better fine-grained visual categorization in complex scenes. We evaluate popular fine-grained visual categorization algorithms to build a benchmark. Extensive experiments show that our proposed method achieves a new state of the art on AIBD-Cars and FGVC Aircraft, and competitive results on CUB-200-2011 and Stanford Cars.