Abstract: In this paper, we have proposed a static crowd scene analysis network via multi-branch dilated convolution block, called MDBNet. It focuses on a joint task of estimating crowd count and high-quality density map from static single image. The proposed MDBNet follows one-stage object detection framework, and consists of two parts: pre-trained convolutional layers as the front end for high-level feature extraction and cascaded multi-branch dilated convolution block as the back end for context information aggregation on different ranges. Pixel-wise objectness probabilities are predicted and regressed to generate density map. The proposed MDBNet is an easy training model with strong learning ability. We have tested it on two public datasets (ShanghaiTech dataset and the UFC_CC_50 dataset). On almost all evaluation criterions, the proposed method has achieved superior performance. Especially on structure quality criterions, including our newly introduced spatial adjusted mutual information measurement, the MDBNet reports a new state-of-the-art performance. The source code will be distributed depending on publication of our work.
Loading