Abstract: Advances in Deep Learning and Computer Vision enabled sophisticated information extraction out of images and video frames. Recent research aims to make objects, their types and relative locations as the video evolves, first class citizens for query processing purposes.In this paper, we initiate research to explore declarative style of querying for real time video streams involving objects and their interactions. We seek to efficiently identify frames in a streaming video in which an object is interacting with another in a specific way, such as for example a human kicking a ball. We first propose an algorithm called progressive filters (PF) that deploys a sequence of inexpensive and less accurate models (filters) to detect the presence of the query specified objects on frames. We demonstrate that PF derives a least cost sequence of filters given the current selectivities of query objects. Since selectivities may vary as the video evolves, we present a dynamic statistical test to determine when to trigger re-optimization of the filters. Finally, we present a filtering approach called Interaction Sheave (IS) that utilizes learned spatial information about objects and interactions to effectively prune frames that are unlikely to involve the query specified action between them, thus improving the frame processing rate further.We present the results of a thorough experimental evaluation involving real data sets, demonstrating the performance benefits of each of our proposals. In particular we experimentally demonstrate that our techniques can improve query performance substantially (up to an order of magnitude in our experiments) while maintaining essentially the same F1-score as alternatives.
0 Replies
Loading