FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning

Published: 27 May 2026, Last Modified: 04 Jun 2026FMEA @ CVPR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scene graph, functional scene understanding, factor graph, functional 3D scene graph
Abstract: Recent work in 3D scene understanding is moving beyond purely spatial analysis toward functional understanding, such as discovering which knob controls which burner or which switch toggles which light. Existing methods, however, reason over each functional relation independently, missing the scene-wide interdependencies that humans use to resolve ambiguity: if one knob controls a burner, the others become less likely candidates to control the same burner. We introduce FunFact, a framework for constructing probabilistic open-vocabulary functional 3D scene graphs from posed RGB-D images. FunFact first builds an object- and part-centric 3D map and uses foundation models to propose plausible functional-relation templates. Candidate functional edges are then instantiated from these templates and encoded as binary variables in a dual factor graph, where inter-edge dependencies are inferred from visual-geometric evidence and enforced through higher-order factors. To benchmark FunFact, we introduce FunThor, a synthetic dataset with exhaustive functional annotations across 12 AI2-THOR scenes. Experiments on existing real-world datasets and FunThor demonstrate that FunFact significantly improves functional-relation recall and reduces calibration error for ambiguous relations, highlighting the benefits of holistic probabilistic modeling for functional scene understanding.
Submission Number: 15
Loading