Abstract: In this paper, we propose a novel Visual Relation of Interest Detection (VROID) task, which aims to detect visual relations that are important for conveying the main content of an image, motivated from the intuition that not all correctly detected relations are really "interesting" in semantics and only a fraction of them really make sense for representing the image main content. Such relations are named Visual Relations of Interest (VROIs). VROID can be deemed as an evolution over the traditional Visual Relation Detection (VRD) task that tries to discover all visual relations in an image. We construct a new dataset to facilitate research on this new task, named ViROI, which contains 30,120 images each with VROIs annotated. Furthermore, we develop an Interest Propagation Network (IPNet) to solve VROID. IPNet contains a Panoptic Object Detection (POD) module, a Pair Interest Prediction (PaIP) module and a Predicate Interest Prediction (PrIP) module. The POD module extracts instances from the input image and also generates corresponding instance features and union features. The PaIP module then predicts the interest score of each instance pair while the PrIP module predicts that of each predicate for each instance pair. Then the interest scores of instance pairs are combined with those of the corresponding predicates as the final interest scores. All VROI candidates are sorted by final interest scores and the highest ones are taken as final results. We conduct extensive experiments to test effectiveness of our method, and the results show that IPNet achieves the best performance compared with the baselines on visual relation detection, scene graph generation and image captioning.
0 Replies
Loading