Abstract: Crowd counting remains a challenging vision task due to the presence of several problems such as severe occlusions, perspective distortions and scale variations in the target scene. How to design an accurate and robust crowd counting estimator has attracted intensive research interest in the past few decades. It is well-known that learning rich features representation is crucial for crowd counting. However, the existing neural-networks-based methods only employ CNN features extracted from the last convolutional layer, and the useful hierarchical information contained in the CNN features is overlooked. To address this problem, we propose a CNN architecture based on the fully convolutional network, which is used to build an end-to-end density map estimation system by combining some of the meaningful convolutional features. Such a combination is exploited to effectively capture both the multi-scale and the multi-level information in complex scenes. Extensive experiments on most existing crowd counting dataset- s including ShanghaiTech Part A, ShanghaiTech Part B and UCF CC 50 demonstrate the effectiveness and the reliability of our approach.
0 Replies
Loading