Abstract: The surging popularity of 360° videos has presented new challenges in the domains of video processing and streaming. In this paper, we introduce a novel 360° video adaptive streaming methodology grounded in multi-scale spatial tiling. This approach seamlessly navigates intricate network conditions, while assuring an optimal user experience. Firstly, through the utilization of a transformer-based saliency detection model, we pinpoint the salient regions within 360° videos that are poised to engage the user’s attention. Subsequently, we undertake Region of Interest (RoI) encoding and multi-scale spatial tiling, facilitating the efficient representation and delivery of 360° videos. Finally, we utilize reinforcement learning to achieve effective adaptive bitrate selection.The results demonstrate the advantages of our approach in terms of enhancing the overall Quality of Experience (QoE) across intricate network environments.
Loading