Abstract: Anticipating future actions is a key component of
intelligence, specifically when it applies to realtime systems, such as robots or autonomous cars.
While recent works have addressed prediction of
raw RGB pixel values, we focus on anticipating the
motion evolution in future video frames. To this
end, we construct dynamic images (DIs) by summarising moving pixels through a sequence of future frames. We train a convolutional LSTMs to
predict the next DIs based on an unsupervised
learning process, and then recognise the activity associated with the predicted DI. We demonstrate the
effectiveness of our approach on 3 benchmark action datasets showing that despite running on videos with complex activities, our approach is able to
anticipate the next human action with high accuracy and obtain better results than the state-of-the-art
methods.
0 Replies
Loading