Abstract: We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. The proposed mechanism reuses CNN feature activations to find the most informative parts of the image at different depths with the help of gating mechanisms and without part annotations. Thus, it can be used to augment any layer of a CNN to extract low- and high-level local information to be more discriminative.
Differently, from other approaches, the mechanism we propose just needs a single pass through the input and it can be trained end-to-end through SGD. As a consequence, the proposed mechanism is modular, architecture-independent, easy to implement, and faster than iterative approaches.
Experiments show that, when augmented with our approach, Wide Residual Networks systematically achieve superior performance on each of five different fine-grained recognition datasets: the Adience age and gender recognition benchmark, Caltech-UCSD Birds-200-2011, Stanford Dogs, Stanford Cars, and UEC Food-100, obtaining competitive and state-of-the-art scores.
TL;DR: We enhance CNNs with a novel attention mechanism for fine-grained recognition. Superior performance is obtained on 5 datasets.
Keywords: computer vision, deep learning, convolutional neural networks, attention
Data: [Adience](https://paperswithcode.com/dataset/adience), [CUB-200-2011](https://paperswithcode.com/dataset/cub-200-2011), [MNIST](https://paperswithcode.com/dataset/mnist), [Stanford Cars](https://paperswithcode.com/dataset/stanford-cars)
15 Replies
Loading