- Keywords: Gaussian Processes, Amortised Inference
- TL;DR: We learn sohpisticated trajectories of an object purely from pixels with a toy video dataset by using a VAE structure with a Gaussian process prior.
- Abstract: We consider the problem of unsupervised learning of a low dimensional, interpretable, latent state of a video containing a moving object. The problem of distilling dynamics from pixels has been extensively considered through the lens of graphical/state space models that exploit Markov structure for cheap computation and structured graphical model priors for enforcing interpretability on latent representations. We take a step towards extending these approaches by discarding the Markov structure; instead, repurposing the recently proposed Gaussian Process Prior Variational Autoencoder for learning sophisticated latent trajectories. We describe the model and perform experiments on a synthetic dataset and see that the model reliably reconstructs smooth dynamics exhibiting U-turns and loops. We also observe that this model may be trained without any beta-annealing or freeze-thaw of training parameters. Training is performed purely end-to-end on the unmodified evidence lower bound objective. This is in contrast to previous works, albeit for slightly different use cases, where application specific training tricks are often required.