MeshReduce: Scalable and Bandwidth Efficient 3D Scene Capture

Anthony Rowe

Published: 07 Mar 2025, Last Modified: 07 Mar 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: 3D video enables a remote viewer to observe a 3D scene from any angle or location. However, current 3D capture solutions incur high latency, consume significant bandwidth, and scale poorly with the number of depth sensors and size of scenes. These problems are largely caused by the current monolithic approach to 3D capture and the use of inefficient data representations for streaming. This paper introduces MeshReduce, a distributed scene capture, stream, and render system that advocates for the use of textured mesh data rep- resentation early in the 3D video capture and transmission process. Textured meshes are compact and can provide lower bitrates for the same quality compared to other 3D data representations. However, streaming textured meshes creates compute and memory challenges to achieve bandwidth efficiency. MeshReduce addresses these issues by using a pipeline that creates independent mesh reconstructions and incrementally merges them, rather than creating a single mesh directly from all sensor streams. While this enables a more efficient implementation, this approach requires optimal exchange of textured meshes across the network. MeshReduce also incorporates a novel approach for network rate control that divides bandwidth between texture and mesh for efficient, adaptive 3D video streaming. We demonstrate a real-time integrated embedded compute implementa- tion of MeshReduce that can operate with commercial Azure Kinect depth cameras as well as a custom sensor front-end that uses LiDAR and 360◦ camera inputs to dramatically increase coverage.