Transformation-based Models of Video Sequences

Joost van Amersfoort; Anitha Kannan; Marc'Aurelio Ranzato; Arthur Szlam; Du Tran; Soumith Chintala

Transformation-based Models of Video Sequences

Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

09 Jul 2025 (modified: 22 Jun 2025)Submitted to ICLR 2017Readers: Everyone

Abstract: In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison between different video frame prediction models, we also propose a new evaluation protocol. We use generated frames as input to a classifier trained with ground truth sequences. This criterion guarantees that models scoring high are those producing sequences which preserve discrim- inative features, as opposed to merely penalizing any deviation, plausible or not, from the ground truth. Our proposed approach compares favourably against more sophisticated ones on the UCF-101 data set, while also being more efficient in terms of the number of parameters and computational cost.

TL;DR: Predict next frames of a video sequence by modelling transformations

Conflicts: uva.nl, facebook.com, fb.com

Keywords: Computer vision, Unsupervised Learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/transformation-based-models-of-video/code)

14 Replies

Loading