Keywords: Large Language Models; Scene Graphs;
TL;DR: An LLM-based multi-agent framework for reasoning on scene graphs
Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning and planning capacities, yet grounding these abilities to a specific environment remains challenging. Recently, there has been a growing interest in representing environments as scene graphs for LLMs, due to their serializable format, scalability to large environments, and flexibility in incorporating diverse semantic and spatial information for various downstream tasks.
Despite the success of prompting graphs as text, existing methods suffer from hallucinations with large graph inputs and limitation in solving complex spatial problems, restricting their application beyond simple object search tasks.
In this work, we explore grounding LLM reasoning in the environment through the $\textit{scene graph schema}$.
We propose $\textit{SG-RwR}$, an iterative reason-while-retrieve scene graph reasoning framework involving two cooperative schema-guided code-writing LLMs: a (1) $\textit{Reasoner}$ for task planning and information querying, and a (2) $\textit{Retriever}$ for extracting graph information based on these queries.
This cooperation facilitates focused attention on task-relevant graph information and enables sequential reasoning on the graph essential for complex tasks.
Additionally, the code-writing design allows for the use of tools to solve problems beyond the capacity of LLMs, which further enhance its reasoning ability on scene graphs.
We also demonstrate that our framework can benefit from task-level few-shot examples, even in the absence of agent-level demonstrations,
thereby enabling in-context learning without data collection overhead.
Through experiments in multiple simulation environments, we show that $\textit{RwR}$ surpasses existing LLM-based approaches in numerical Q\&A and planning tasks.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11222
Loading