Incremental3D: Incremental 3D Scene Generation with Scene Graph for Immersive Teleoperation

TMLR Paper6689 Authors

27 Nov 2025 (modified: 01 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Graph-based 3D scene generation aims to synthesize 3D environments conditioned on scene graphs and has been widely explored in applications such as 3D gaming and interior design. However, its potential for immersive robotic teleoperation has been largely overlooked. In this setting, transmitting lightweight incremental 3D scene graphs from the robot-side to the operator-side is far more bandwidth-efficient and lower-latency than streaming raw RGB or point-cloud data. %from the robot side to the operator side, and At the same time, recent advances in robot-side 3D scene-graph learning now make such incremental scene-graphs readily obtainable from RGB-D inputs. % for this new teleoperation system. Despite this opportunity, existing scene-graph-based 3D scene generation methods are fundamentally single-shot: inserting even a single new object requires regenerating the entire scene. This global re-computation incurs prohibitive latency and renders existing approaches unsuitable for real-time immersive robotic teleoperation, where the scene graph, and therefore the scene itself, is built and generated incrementally as the robot moves through the environment. To address this limitation, we propose \textit{Incremental3D}, the first framework capable of incremental graph-to-3D scene generation for teleoperation applications. \textit{Incremental3D} augments an existing scene graph with a global classification (CLS) node that maintains a holistic representation of the evolving environment. At each update step, the CLS node aggregates global context and conditions the generation of newly added objects, enabling geometry synthesis and spatial prediction without recomputing unchanged regions. Extensive experiments demonstrate that \textit{Incremental3D} achieves 38 Hz generation speed while maintaining high spatial accuracy, indicating its suitability for real-time teleoperation and other latency-sensitive 3D applications.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Matthew_Walter1
Submission Number: 6689
Loading