Spatial Visibility and Temporal Dynamics: Rethinking Field of View Prediction in Adaptive Point Cloud Video Streaming

Published: 31 Mar 2025, Last Modified: 17 Mar 202516th ACM Multimedia Systems Conference (MMSys'25)EveryoneCC BY-NC-ND 4.0
Abstract: Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points inside a viewer’s FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content’s impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model (CellSight) that leverages the historical 3D visibility data and incorporates spatial perception, occlusion between points, and neighboring cell correlation to predict the cell visibility in the future. We focus on multi-second ahead prediction to enable the use of long pre-fetching buffers in on-demand streaming, critical for enhancing the robustness to network bandwidth fluctuations. CellSight significantly improves the long-term cell visibility prediction, reducing the prediction Mean Squared Error (MSE) loss by up to 50% compared to the state-of-the-art models when predicting 2 to 5 seconds ahead, while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
Loading