Spatial Visibility and Temporal Dynamics: Rethinking Field of View Prediction in Adaptive Point Cloud Video Streaming
Abstract: Field-of-View (FoV) adaptive streaming significantly reduces bandwidth
requirement of immersive point cloud video (PCV) by only
transmitting visible points inside a viewer’s FoV. The traditional
approaches often focus on trajectory-based 6 degree-of-freedom
(6DoF) FoV predictions. The predicted FoV is then used to calculate
point visibility. Such approaches do not explicitly consider video
content’s impact on viewer attention, and the conversion from FoV
to point visibility is often error-prone and time-consuming. We
reformulate the PCV FoV prediction problem from the cell visibility
perspective, allowing for precise decision-making regarding the
transmission of 3D data at the cell level based on the predicted
visibility distribution. We develop a novel spatial visibility and
object-aware graph model (CellSight) that leverages the historical
3D visibility data and incorporates spatial perception, occlusion
between points, and neighboring cell correlation to predict the cell
visibility in the future. We focus on multi-second ahead prediction
to enable the use of long pre-fetching buffers in on-demand streaming,
critical for enhancing the robustness to network bandwidth
fluctuations. CellSight significantly improves the long-term cell
visibility prediction, reducing the prediction Mean Squared Error
(MSE) loss by up to 50% compared to the state-of-the-art models
when predicting 2 to 5 seconds ahead, while maintaining real-time
performance (more than 30fps) for point cloud videos with over 1
million points.
Loading