Keywords: Information Theory, Video Compression, Rate-Distortion-Perception
Abstract: We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. We also demonstrate that encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be transformed to a reconstruction satisfying the perfect perceptual quality based on FMD by changing the distortion at most with a factor of two. A similar argument holds for the PLF-JD for a class of encoders operating at low-rate regime. We validate our results using information-theoretic analysis and deep-learning based experiments on moving MNIST and KTH datasets.
Submission Number: 23
Loading