- Keywords: video compression, compression, autoregressive model, latent variable model, generative model, deep learning, computer vision
- Abstract: There has been a recent surge of interest in neural video compression models that combines data-driven dimensionality reduction with learned entropy coding. ScaleSpace Flow (SSF) is among the most popular variants due to its favorable rate-distortion performance. Recent work showed that this approach could be further improved by structured priors and stochastic temporal autoregressive transforms on the frame level. However, as of early 2021, most state-of-the-art compression approaches work with time-independent priors. Assuming that frame patents are still temporally correlated, further compression gains should be expected by conditioning the priors on temporal information. We show that the naive way of conditioning priors on previous stochastic latent states degrades performance, but temporal conditioning on a deterministic quantity does lead to a consistent improvement over all baselines. Evaluating the benefits of the temporal prior given the involved challenges in training and deployment remains an open question.