RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling

Published: 01 Jan 2025, Last Modified: 09 Nov 2025ACM Trans. Multim. Comput. Commun. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a uni-directional pattern of cross-modal contextual information, neglecting the exploration of bi-directional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-Directional Cross-Modal Prior Transfer (Bi-CPT) modules and a Bi-Directional Cross-Modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at https://github.com/xyy7/Learning-based-RGB-D-Image-Compression.
Loading