Abstract: Point cloud data is pivotal in applications like autonomous driving, virtual reality, and robotics. However, its substantial volume poses significant challenges in storage and transmission. In order to obtain a high compression ratio, crucial semantic details usually confront severe damage, leading to difficulties in guaranteeing the accuracy of downstream tasks. To tackle this problem, we are the first to introduce a novel Region of Interest (ROI)-guided Point Cloud Geometry Compression (RPCGC) method for human and machine vision. Our framework employs a dual-branch parallel structure, where the base layer encodes and decodes a simplified version of the point cloud, and the enhancement layer refines this by focusing on geometry details. Furthermore, the residual information of the enhancement layer undergoes refinement through an ROI prediction network. This network generates mask information, which is then incorporated into the residuals, serving as a strong supervision signal. Additionally, we intricately apply these mask details in the Rate-Distortion (RD) optimization process, with each point weighted in the distortion calculation. Our loss function includes RD loss and detection loss to better guide point cloud encoding for the machine. Experiment results demonstrate that RPCGC achieves exceptional compression performance and better detection accuracy (10\% gain) than some learning-based compression methods at high bitrates in ScanNet and SUN RGB-D datasets.
Primary Subject Area: [Systems] Transport and Delivery
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work makes a significant contribution to multimedia/multi-modal processing. Addressing the critical role of point cloud data in applications such as autonomous driving, virtual reality, and 3D reconstruction, the method tackles challenges in storage and transmission, particularly in preserving semantic details during compression. Introducing the novel Region of Interest (ROI) Guided Point Cloud Geometry Compression method, designed for both human and machine vision. Its dual-layer structure effectively captures both the overall structure and intricate details of point clouds, maintaining reconstruction quality and downstream task accuracy during compression. Notably, the method refines residual information of the enhancement layer through an ROI prediction network, generating mask information incorporated into the residuals as a strong supervisory signal. Additionally, intricately applying these mask details in the Rate-Distortion optimization process further enhances point cloud encoding quality. Thus, the method offers an innovative point cloud compression approach for multimedia/multi-modal processing, fostering significant technological advancements in relevant application domains.
Supplementary Material: zip
Submission Number: 3319
Loading