Pyramid Coding for Functional Scene Element Recognition in Video Scenes

Eran Swears

31 Dec 2019OpenReview Archive Direct UploadReaders: Everyone

Abstract: Recognizing functional scene elements in video scenes based on the behaviors of moving objects that interact with them is an emerging problem of interest. Existing approaches have a limited ability to characterize elements such as cross-walks, intersections, and buildings that have low activity, are multi-modal, or have indirect evidence. Our approach recognizes the low activity and multi-model elements (crosswalks/intersections) by introducing a hierarchy of descriptive clusters to form a pyramid of codebooks that is sparse in the number of clusters and dense in content. The incorporation of local behavioral context such as person-enter-building and vehicle-parking nearby enables the detection of elements that do not have direct motion-based evidence, e.g. buildings. These two contributions significantly improve scene element recognition when compared against three state-of-the-art approaches. Results are shown on typical ground level surveillance video and for the first time on the more complex Wide Area Motion Imagery.

0 Replies