Keywords: Embodied AI, Vision-Language, Lifelong Learning
Abstract: Constructing a compact and informative 3D scene representation is essential for effective embodied reasoning and exploration, especially in complex environments over long periods. Existing approaches have relied on object-centric graph representations, which oversimplify 3D scenes by modeling them as individual objects and describing inter-object relationships through rigid textual descriptions. This rigidity leads to the loss of rich spatial relationships between objects, which are essential for embodied scene reasoning tasks. Furthermore, these representations lack natural mechanisms for active exploration and memory management, which hampers their applications for lifelong autonomy. In this work, we propose SnapMem, a novel 3D scene representation that leverages a compact set of informative snapshot images to cover the scene based on object co-visibility. These snapshot images capture rich spatial and semantic information among objects within the same view and their surroundings. We then illustrate how such a representation can be directly integrated with frontier-based exploration algorithms to facilitate active exploration by leveraging unexplored regions and scene memory. To support lifelong memory in active exploration settings, we further present an efficient memory aggregation pipeline to incrementally construct SnapMem, as well as an effective memory retrieval technique for memory management. Experimental results over three benchmarks demonstrate that SnapMem significantly enhances agents' reasoning and exploration capabilities in 3D environments over extended periods, highlighting its potential for advancing applications in embodied AI.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 783
Loading