- Keywords: Tensor decomposition, Video prediction
- TL;DR: we propose convolutional tensor-train LSTM, which learns higher-order Convolutional LSTM efficiently using convolutional tensor-train decomposition.
- Abstract: Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames.Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations. A potential solution is to extend to higher-order spatio-temporal recurrent models. However, such a model requires a large number of parameters and operations, making it intractable to learn in practice and is prone to overfitting. In this work, we propose convolutional tensor-train LSTM (Conv-TT-LSTM), which learns higher-orderConvolutional LSTM (ConvLSTM) efficiently using convolutional tensor-train decomposition (CTTD). Our proposed model naturally incorporates higher-order spatio-temporal information at a small cost of memory and computation by using efficient low-rank tensor representations. We evaluate our model on Moving-MNIST and KTH datasets and show improvements over standard ConvLSTM and better/comparable results to other ConvLSTM-based approaches, but with much fewer parameters.