Learning Object-Centric Transformation for Video Prediction

Xiongtao Chen, Wenmin Wang, Jinzhuo Wang, Weimian Li

Published: 2017, Last Modified: 07 Oct 2023ACM Multimedia 2017Readers: Everyone

Abstract: Future frame prediction for video sequences is a challenging task and worth exploring problem in computer vision. Existing methods often learn motion information for the entire image to predict next frames. However, different objects in the same scene often move and deform in different ways intuitively. Considering the human visual system, one often pays attention to the key objects that contain crucial motion signals, rather than compress an entire image into a static representation. Motivated by this property of human perception, in this work, we develop a novel object-centric video prediction model that learns local motion transformation dynamically for key object regions with visual attention. By transforming objects iteratively to the original input frames, next frame can be produced. Specifically, we design an attention module with replaceable strategies to attend to objects in video frames automatically. Our method does not require any annotated data during training procedure. To produce sharp predictions, adversarial training is adopted in our work. We evaluate our model on the Moving MNIST and UCF101 datasets and report competitive results, compared to prior methods. The generated frames demonstrate that our model can characterize motion for different objects and produce plausible future frames.

0 Replies