Neural Video Compression with Dynamic Temporal Context Mining

Published: 01 Jan 2024, Last Modified: 06 Mar 2025WCSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural Video Compression (NVC) has advanced significantly in recent years, with improvements in inter-prediction techniques enabling neural video codecs to outperform traditional methods. Most NVCs rely on pixel information from neighboring frames or a single temporal feature for motion compensation, which does not fully utilize the available information on the decoder side. In this paper, we introduce an innovative and efficient motion compensation method that leverages long-term spacial-temporal dependencies in video coding and dynamically mines temporal features. Specifically, the proposed Dynamic Temporal Context Mining (DTCM) module utilize spatial information from reconstructed images to compensate for temporal features, thereby correcting errors accumulated during temporal propagation. The DTCM module extends the model's observation window, allowing for the integration of longer-term temporal information to enhance the current reference features. Additionally, the masks generated by DTCM during the prediction process act as an attention mechanism, further filtering the prediction features and enhancing the quality of the reference features. Extensive experiments demonstrate the effectiveness of the proposed method, achieving 45.38% bitrate saving compared to the baseline approach.
Loading