Keywords: class attention, global average pooling, visual recognition, over-concentration, feature variance
Abstract: Recently, the global average pooling is believed to be losing the local information that saturates the performance of neural networks. In this lossy pooling operation, we propose a new interpretation, termed over-concentration, to explain the real reason why it degrades network performance. We argue that the problem of global average pooling is disregarding the local patterns by relying solely on the overly concentrated activation. Global average pooling enforces the network to learn objects regardless of their location, so features tend to be activated only in specific regions. To support this claim, we provide a novel analysis of the problems that over-concentration brings about in the network with extensive experiments. We analyze the over-concentration through problems arising from feature variance and dead neurons that are not activated. Based on our analysis, we introduce a multi-token and multi-scale class attention pooling layer to alleviate the over-concentration problem. The proposed attention pooling method captures rich, localized patterns with an efficient network design using multiple scales and tokens. Our method is highly applicable to downstream task and network architectures such as CNN, ViT, and MLP-Mixer. In our experiment, the proposed method improves MLP-Mixer, ViT, and CNN architectures with little additional resources, and a network employing our pooling method works well compared to even stateof-the-art networks. We will opensource the proposed pooling method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
7 Replies
Loading