Unified Recurrence Modeling for Video Action AnticipationDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: Forecasting future events based on evidence of current conditions is an innate skill of human beings, and key for predicting the outcome of any decision making. In artificial vision for example, we would like to predict the next human action before it is actually performed, without observing the future video frames associated to it. Computer vision models for action anticipation are expected to collect the subtle evidence in the preamble of the target actions. In prior studies recurrence modeling often leads to better performance, and the strong temporal inference is assumed to be a key element for reasonable prediction. To this end, we propose a unified recurrence modeling for video action anticipation by generalizing the recurrence mechanism from sequence into graph representation via message passing. The information flow in space-time can be described by the interaction between vertices and edges, and the changes of vertices for each incoming frame reflects the underlying dynamics. Our model leverages self-attention for all building blocks in the graph modeling, and we introduce different edge learning strategies can be end-to-end optimized while updating the vertices. Our experimental results demonstrate that our modeling method is light-weight, efficient, and outperforms all previous works on the large-scale EPIC-Kitchen dataset.
Supplementary Material: zip
12 Replies

Loading