Abstract: The explosive growth of video data has brought great challenges to video retrieval, which aims to find out related videos from a video collection. Most users are usually not interested in all the content of retrieved videos but have a more fine-grained need. In the meantime, most existing methods can only return a ranked list of retrieved videos lacking a proper way to present the video content. In this paper, we introduce a distinctively new task, namely One-Stop Video Delivery (OSVD) aiming to realize a comprehensive retrieval system with the following merits: it not only retrieves the relevant videos but also filters out irrelevant information and presents compact video content to users, given a natural language query and video collection. To solve this task, we propose an end-to-end Hierarchical Video Graph Reasoning framework (HVGR), which considers relations of different video levels and jointly accomplishes the one-stop delivery task. Specifically, we decompose the video into three levels, namely the video-level, moment-level, and the clip-level in a coarse-to-fine manner, and apply Graph Neural Networks (GNNs) on the hierarchical graph to model the relations. Furthermore, a pairwise ranking loss named Progressively Refined Loss is proposed based on prior knowledge that there is a relative order of the similarity of query-video, query-moment, and query-clip due to the different granularity of matched information. Extensive experimental results on benchmark datasets demonstrate that the proposed method achieves superior performance compared with baseline methods.
0 Replies
Loading