Abstract: Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate the effectiveness of the algorithm, including exploiting full pose of the human, hand-keypoints, and their association with the firearm. The knowledge of spatially localized features is key to the success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously existing firearm detection dataset, by adding more images and tagging in the extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results $({78.5 AP}_{hold})$ demonstrate effectiveness of the proposed method.
0 Replies
Loading