Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting

Published: 01 Jan 2023, Last Modified: 09 Apr 2025ICONIP (11) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Most of the existing crowd counting methods are based on convolutional neural networks (CNN) to solve the crowd scale and background noise problems. These methods can effectively extract local features, but their convolutional kernel sizes are limited so that it is hard to obtain global information which is also crucial for scale awareness and noise discrimination. In this paper, we propose a Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting (MELANet), which can extract both global and local information based on CNN. MELANet is composed of three parts: feature extraction module (FEM) for original feature extraction, multiscale equivalent attention module (MEAM) for global and local information combination, and fusion module (FM) for multiscale feature fusion. In MEAM, by decomposing large convolution kernels into equivalent combinations of small convolution kernels, the model obtains receptive fields equivalent to the large convolutional kernels with lower complexity and less parameters. It enables local and global correlation in the attention mechanism based on CNN, which makes the model focus more on the crowd head region to resist the background noise. Besides, we use a multiscale structure and different convolution kernel sizes to encode contextual information at different scales into the feature maps to deal with head scale transformations. Furthermore, we add gate channel attention units in MEAM to enhance the channel adaptivity of the model. Extensive experiments demonstrate that MELANet can achieve excellent counting performance on three popular crowd counting datasets.
Loading