Abstract: This paper presents an anchor-pair network for crowded human detection, which can overcome and solve the difficulties caused by occlusion in crowded scenes. Specifically, we use a function-aware network structure to extract more distinctive and discriminative features for head and full-body respectively, and then a CNN module is also exploited to fuse the features by learning the correlations between head and full-body to reduce crowd errors. Meanwhile, a novel paired form for anchors, denoted as anchor-pair, is proposed to estimate the head regions and full-body regions simultaneously. Furthermore, a new ingenious Joint-NMS is introduced to perform on the detected head and full-body box pairs, which produces significant performance improvement in heavily occluded scenarios at tiny computational cost. Our anchor-pair network achieves a state-of-the-art result on the CrowdHuman dataset which reduces the MR <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−2</sup> to 55.43%, achieving 11.59% relative improvement over our dataset baseline.
0 Replies
Loading