Abstract: The widely used convolutional attention blocks (e.g., Squeeze-and-Excitation attention, Coordinate attention) all use an indirect convolution layer for modeling between channels, lacking direct interaction. In this article, we present a lightweight channel attention mechanism, partial channel pooling attention (PPA), for handling object detection and segmentation tasks. While minimizing the redundancy of channels and pixels, complementary information is achieved through direct interaction between channels. Specifically, we pool the feature maps into pooling modules (pooling blocks) of size KxK. Each pixel after pooling is considered a feature of each channel. In this way, our attention mechanism (PPA) is lightweight and can be directly embedded into existing networks. Our channel attention module selectively emphasizes interdependent channel mappings through the direct interaction of information between channels rather than indirectly through convolutional layers. Through extensive experiments in object detection and semantic segmentation, our PPA performs better on various types of base models. For instance, adding PPA achieves training throughput comparable to adding Coordinate attention but has twice the improvement effect of adding Coordinate attention in COCO detection and semantic segmentation.
Loading