NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-Wise Modeling
Abstract: Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are are limiting as they do not exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To enhance efficiency, we quantize the parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12x, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm scales naturally due to its patch-wise and autoregressive design. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA also achieves 6x decoding speed scaling well with more GPUs, making it practical for various deployment scenarios. 1 1 The project site can be found here.
Loading