ImmerScope: Multi-view Video Aggregation at Edge towards Immersive Content Services

Published: 2024, Last Modified: 15 Oct 2025SenSys 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The multi-camera capture system is an emerging visual sensing modality. It facilitates the production of various immersive contents ranging from regular to neural videos. Although the delivery of immersive content is popular and promising, it suffers from the bandwidth bottleneck when streaming multi-view videos to the cloud (i.e., multi-view video aggregation). Existing works fail to provide a bandwidth-efficient and content-generic solution. Even the closest effort to ours based on the SOTA multi-view video codecs suffers from issues of underutilized dependency and content distortion. In this paper, we present ImmerScope, a multi-view video aggregation framework at the edge with a neural multi-view video codec. It outperforms existing solutions with highly-utilized dependency via neuron connections and distortion awareness via end-to-end training. Evaluations on diverse multi-camera setups show that ImmerScope outperforms single-view codecs by at least 64% bandwidth savings in peak-signal-to-noise ratio with a frame rate of 50 fps.
Loading