Learning to represent and predict evolving visual signals via polar straighteningDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Video prediction, self-supervised representation learning, phase prediction, invariance / equivariance factorization
Abstract: Observer motion and continuous deformations of objects and textures imbue natural videos with distinct temporal structures, enabling the prediction of future frames from past ones. Conventional methods proceed by estimating local motion, or optic flow, and then using this to predict future frames by warping and copying content. Here, we explore a more direct methodology, in which frames are transformed into an alternative representation where temporal structure and evolution are more readily accessible. As a base case, a rigidly translating pattern can be described in the frequency domain as a linear combination of sinusoids, each with constant amplitude and phase that cycles at a rate proportional to its frequency. This fundamental property of Fourier representation reduces prediction to angular extrapolation. Motivated by the geometry of this well-known case, we formulate a self-supervised learning problem which seeks a transformation of video frames to facilitate next-frame prediction in these natural polar coordinates. We construct a network architecture in which pairs of convolutional channels are used to factorize signals into slowly evolving amplitudes and linearly advancing phases. We train this network to predict future frames, and compare its performance with that of conventional methods using optic flow, and other learned predictive neural networks, evaluated on natural videos from the DAVIS dataset. We find that the polar predictor achieves high prediction performance while remaining interpretable and fast, thereby demonstrating the potential of a flow-free video processing methodology that is trained end-to-end to predict natural video content.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning
10 Replies

Loading