Abstract: Existing multi-view image compression methods often rely on 2D projection-based similarities between views to estimate disparities. While effective for small disparities, such as those in stereo images, these methods struggle with the more complex disparities encountered in wide-baseline multi-camera systems, commonly found in virtual reality and autonomous driving applications. To address this limitation, we propose 3D-LMVIC, a novel learning-based multi-view image compression framework that leverages 3D Gaussian Splatting to derive geometric priors for accurate disparity estimation. Furthermore, we introduce a depth map compression model to minimize geometric redundancy across views, along with a multi-view sequence ordering strategy based on a defined distance measure between views to enhance correlations between adjacent views. Experimental results demonstrate that 3D-LMVIC achieves superior performance compared to both traditional and learning-based methods. Additionally, it significantly improves disparity estimation accuracy over existing two-view approaches.
Lay Summary: Modern applications like virtual reality use many cameras to capture scenes from different angles, producing large amounts of image data. Compressing this data is challenging, especially when camera views are far apart. Our method, 3D-LMVIC, uses 3D scene understanding to better align and compress images from different views. It achieves higher compression efficiency and quality by estimating 3D geometry and reordering the views smartly. This helps reduce storage and transmission costs in real-world 3D applications.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/YujunHuang063/3D-GP-LMVIC
Primary Area: Applications->Computer Vision
Keywords: Multi-View Image Compression; 3D Gaussian Splatting; Deep Learning
Submission Number: 6156
Loading