Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Unsupervised Hierarchical Video Prediction
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Long term video prediction is a challenging machine learning problem, whose solution would enable intelligent agents to plan sophisticated interactions with their environment and assess the effect of their actions. Much recent research has been devoted to video prediction and generation, but mostly for short-scale time horizons. The hierarchical video prediction method by Villegas et al. (2017) is an example of a state of the art method for long term video predictions. However,their method has limited applicability in practical settings as it requires annotation of structures (e.g., pose) at every time step. This paper presents a long term hierarchical video prediction model that doesn’t have such a restriction. We show that the network learns its own learned higher-level structure (e.g., pose equivalent hidden variables) that works better in cases where that higher-level structure doesn’t capture all of the information needed to predict the next frames. This method gives sharper results than other video prediction methods which don’t re-quire a groundtruth pose, and its efficiency is shown on the Humans 3.6M and Robot Pushing datasets
TL;DR:We show ways to train a hierarchical video prediction model without needing pose labels.