Keywords: program synthesis, deep learning, object tracking
TL;DR: We develop a language for expressing queries over object trajectory data and an algorithm for synthesizing such queries from data.
Abstract: We propose a novel framework called Quivr for synthesizing queries to identify events of interest in video data. For instance, Quivr can be used to identify instances of human driving behaviors such as lane changes or left turns, which are important for designing planning algorithms for autonomous cars. Our queries operate over object trajectories predicted by a deep object tracking model. Then, a query consists of regular expression operators used to compose underlying predicates (e.g., whether a car is in a lane), and selects a subset of trajectories. A key challenge is that queries are difficult for end users to develop: queries must reason about complex spatial and temporal patterns in object trajectories in order to select trajectories of interest, and predicates often include real-valued parameters (e.g., whether two cars are within a certain distance) that can be tedious to manually tune. Thus, Quivr automatically synthesizes queries given examples of trajectories that the query should match. To make the synthesis procedure efficient, we use overapproximations to prune invalid branches of the query search space, including using a quantitative variant of our query semantics to efficiently prune the search space over parameter values. We also propose two optimizations for speeding up the execution of our queries. Finally, we leverage an active learning strategy to disambiguate between multiple consistent candidate queries by collecting additional labels from the user. We evaluate Quivr on a benchmark of 11 tasks, and demonstrate that it can synthesize accurate queries for each task given just a few examples, and that our pruning strategy and optimizations substantially reduce synthesis time.