Keywords: neural video codecs, quantization, temporal buffer
Abstract: Full-precision floating-point neural image and video codecs pose significant challenges in power consumption, storage requirements, and cross-platform interoperability, particularly when deployed on resource-constrained devices. To address these issues, network quantization techniques have been extensively studied for neural image codecs. However, the quantization of neural video codecs remains largely unexplored. Unlike quantizing neural image codecs, quantizing neural videos codecs requires significantly more effort. Many coding components operate on temporally correlated data and often rely on features propagated from previous frames, introducing additional sensitivity to both cross-platform round-off errors and quantization noise. This work presents the first systematic study of quantization effects across multiple neural video coding frameworks and temporal buffering strategies. Extensive analyzes are conducted to evaluate how various combinations of coding frameworks and temporal buffering strategies respond to various quantization schemes in terms of coding performance and computational complexity. Experimental results confirm the superiority of our mixed-precision quantization to fixed-precision quantization when they are incorporated into state-of-the-art neural video codecs. At a time when the development of neural video codecs is transitioning from maximizing rate-distortion performance to addressing practicality issues, this work offers holistic insights into key design considerations.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16245
Loading