Keywords: Video Compression, Model Integerization
Abstract: Cross-platform coding consistency is a fundamental prerequisite for neural video codecs (NVCs). Previous works address this by adopting a floating-point-centric perspective to quantize a pretrained floating-point NVC into an integer one. However, this often leads to suboptimal performance with a significant bitrate increase. In this paper, we propose a high-performance cross-platform NVC by designing a comprehensive integer-centric training pipeline for model integerization, which enables the training of an integer NVC from scratch. This approach avoids initialization from a floating-point model and allows for more flexible learning across the entire integer space. Observing that the division operations in previous integerization methods destabilize from-scratch training, we propose a multiply-twice integerization strategy to circumvent this instability. Furthermore, we introduce a memorized temporal modeling mechanism, leveraging a memory module to capture long-term dependencies and enhance model capacity. With these innovations, we implement in-loop decoding modules in integer to ensure cross-platform coding consistency, which is further validated across multiple platforms. As a result, our cross-platform NVC achieves an average 20% bitrate reduction compared to H.266/VTM while maintaining an encoding/decoding speed of 153.0/137.3 fps for 1080p video. The code will be released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1445
Loading