SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting

03 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Gaussian Splatting, 3D Affordance Reasoning, Sequential Task Planning
Abstract: 3D affordance reasoning, the task of associating language instructions with the functional regions of 3D objects, is a critical capability for embodied agents. With its photorealistic rendering and precise geometric fidelity, the recently emerged 3D Gaussian Splatting (3DGS) has become an ideal representation for such fine-grained localization. Despite its potential, existing 3DGS-based methods are confined to single-object and single-step interactions, failing to address the long-horizon, multi-object tasks common in the real world. To fill this gap, we introduce a novel task of Sequential 3D Gaussian Affordance Reasoning and construct SeqAffordSplat, the first large-scale dataset with over 1,800 complex scenes to support this research. We then propose SeqSplatNet, an innovative end-to-end framework that leverages a Large Language Model (LLM) for autoregressive planning, directly mapping high-level instructions to a sequence of precise 3D affordance masks. To enhance performance, we introduce a Conditional Geometric Reconstruction pre-training strategy to build a robust geometric prior and a Semantic Feature Injection mechanism to fuse multi-scale semantic knowledge from 2D Vision Foundation Models. Extensive experiments demonstrate that our model achieves state-of-the-art performance on our new benchmark, successfully advancing affordance reasoning for long-horizon and scene-level sequential tasks.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 1435
Loading