SCOUT: Spatial-Aware Continual Scene Understanding and Switch Policy for Embodied Mobile Manipulation

ICLR 2026 Conference Submission2238 Authors

05 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Embodied AI, Scene Understanding, Neural Policy
Abstract: Coordinating navigation and manipulation with robust performance is essential for embodied AI in complex indoor environments. To address this, SCOUT (Spatial-Aware Continual Scene Understanding and Switch Policy for Embodied Mobile Manipulation) is proposed, consisting of: 1) Spatial-Aware Continual Scene Understanding with a Scene Modeling Module for effective scene modeling and a Mask Query Module for precise interaction mask generation; and 2) Switch Policy that dynamically transitions between long-term navigation and short-term reactive planning when viable manipulation opportunities are detected. SCOUT achieves state-of-the-art performance on ALFRED benchmark, reaching 65.09\% and 60.79\% success rates in test seen and unseen environments respectively with step-by-step instructions, while maintaining consistently robust performance (61.24\% / 56.04\%) without detailed guidance for long-horizon tasks.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 2238
Loading