Recognizing human-vehicle interactions from aerial video without training

Jong Taek Lee, Chia-Chih Chen, Jake K. Aggarwal

2011 (modified: 10 Nov 2022)CVPR Workshops 2011Readers: Everyone

Abstract: We propose a novel framework to recognize human-vehicle interactions from aerial video. In this scenario, the object resolution is low, the visual cues are vague, and the detection and tracking of objects are less reliable us a consequence. Any methods that require, the accurate tracking of objects or the exact matching of event definition are better avoided. To address these issues, we present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle relationships with the pre-specified event definitions in a piecewise fashion. With special interest in recognizing a person getting into and out of a vehicle, we have tested our method on a subset of the VIRAT Aerial Video dataset [ ] and achieved superior results. Our framework can be easily extended to recognize other types of human-vehicle interactions.

0 Replies