Abstract: This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.
External IDs:dblp:journals/tmm/ShinLBKK25
Loading