Abstract: With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.
Loading