Keywords: Scene graph manipulation, autoregressive modeling, spatial reasoning
TL;DR: SG-Tailor produces diversified scene graphs for scene-related tasks
Abstract: Scene graphs capture complex relationships among objects and serve as powerful priors for 3D scene understanding tasks, yet their manipulation, such as adding nodes or modifying edges, remains underexplored and highly challenging. Even a single edge change can propagate conflicts across the graph due to intricate interdependencies, making the task computationally difficult. We propose $\textbf{SG-Tailor}$, an autoregressive model for structure-aware scene graph editing that generates commonsense edges for newly added nodes and resolves conflicts arising from edge modifications to ensure globally coherent graphs. For node addition, SG-Tailor queries the target node, forms candidate pairs with existing nodes, and predicts the appropriate relationships, while for edge modification it introduces a $\textbf{Cut-and-Stitch}$ strategy that repairs conflicts and adjusts the graph holistically. Extensive experiments demonstrate that SG-Tailor substantially outperforms prior approaches and can be seamlessly integrated as a plug-and-play module for downstream tasks such as scene generation and robotic manipulation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12993
Loading