Low-Complexity 3D-Vision Conferencing System based on Accelerated RIFE Model

Hongyue Huang, Xilong Zhou, Hongbo Ning, Haopeng Lu, Qi Zhang, Yanpeng Liang, Wanjun Lyu, Chuanmin Jia, Xinfeng Zhang, Liuxin Zhang, Siwei Ma

Published: 2024, Last Modified: 17 Apr 2025PCS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent advancements in telecommunication technologies have exceeded the requirements of numerous video-based Real-Time Communication (RTC) applications. Mean-while, in light of the growing demand for immersive 3D visual experience, researchers are currently focusing on developing next-generation telepresence systems. This paper presents a novel immersive conferencing system that offers a smooth, high-fidelity, and life-size autostereoscopic display of remote user portraits. To generate the binocular stereo vision in real time, an adaptive low-complexity view synthesis method based on an accelerated Real-time Intermediate Flow Estimation (RIFE) model is employed, which performs direct cross-view generation based on decoded multi-view videos and tracked eye positions of the watching user. Thanks to eliminating the complex 3D modeling procedure that relies on depth images, the proposed system requires significantly fewer computational resources and lower video transmission bandwidth compared to existing immersive conferencing systems. Therefore, the proposed system is low-cost and flexible when accommodating diverse conferencing environments.