Abstract: Pedestrian detection is a fundamental task in a wide range of computer vision applications. Detecting the head-shoulder appearance is an attractive way for pedestrian detection, especially in scenes with crowd, heavy occlusion or large camera tilt angles. However, the head-shoulder part contains less information than the full human body, which requires better feature extraction to ensure the effectiveness of the detection. This paper proposes a head-shoulder detection method based on the convolutional neural network (CNN). According to the characteristics of the head and shoulders, our method integrates a structure-sensitive ROI pooling layer into the CNN architecture. The proposed CNN is trained in a multi-task scheme with classification and localization outputs. Furthermore, the convolutional layers of the network are pre-trained using a triplet loss to capture better features of the head-shoulder appearance. Extensive experimental results demonstrate that the average accuracy of the proposed method is 89.6% when the IoU threshold is 0.5. Our method obtains close results to the state-of-the-art method Faster R-CNN while outperforming it in speed. Even when the number of extracted candidate regions increases, the increased detection time is negligible. In addition, when the IoU threshold is greater than 0.6, the average accuracy of our method is higher than that of Faster R-CNN, which indicates that our results have higher IoU with ground truth.
0 Replies
Loading