SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

Published: 01 Jan 2024, Last Modified: 13 Nov 2024DCC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods [1] heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we propose Single Stream Neural Video Compression, SS-NVC. It implicitly utilizes temporal information to eliminate temporal redundancy in video sequence. Without MV encoder-decoder [2] , it only needs to transmit single bit-stream in channel and use single-stage training strategy, which can greatly simplify training and compression process of NVC. Besides, we reimplement window-based attention intra-frame image compression with channel-wise and checkerboard auto-regression entropy model, enhance contextual encoder with mixing global and local context module, and redesign Dense-UNet frame generator with stronger generation capability to improve SSNVC’s compression performance. Experiment results show that SSNVC can achieve competitive performance on multiple benchmarks.
Loading