Abstract: Finegrained recognition focuses on the challenging task of automatically identifying the subtle differences between similar categories. Current state-of-the-art approaches require elaborated feature learning procedures, involving tuning several hyper-parameters, or rely on expensive human annotations such as objects or parts location. In this paper we propose a simple method for fine-grained recognition that exploits a nearly cost-free attention-based focus operation to construct an ensemble of increasingly specialized Convolutional Neural Networks. Our method achieves state-of-the-art results on three of the most popular datasets used for fine-grained classification namely CUB Birds 200-2011, FGVC-Aircraft and Stanford Cars requiring minimal hyperparameter tuning and no annotations.
Loading