R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Ziyi Bai; Hanxuan Li; Bin Fu; Chuyan Xiong; Ruiping Wang; Xilin Chen

R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Ziyi Bai, Hanxuan Li, Bin Fu, Chuyan Xiong, Ruiping Wang, Xilin Chen

25 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Embodied AI, Large Language Model, Embodied Instruction Following, Robotic Planning

Abstract: This paper explores the potential of leveraging large language models (LLMs) as low-level action planners capable of executing long-horizon tasks based on natural language instructions. Although LLMs can act as the "brain" of robots by excelling in high-level task planning, they are not yet capable of directly guiding the "body" to execute low-level motion plans. This limitation stems from a communication gap between the "brain" and the "body". Specifically, LLMs lack access to rich spatial semantic information from the robot's real-time observations, hindering their ability to generate precise and actionable low-level plans.To address this, we propose a unified framework that bridges high-level and low-level planning by establishing an efficient communication interface between LLMs and robots. Our insight is to formulate the task as playing chess with LLMs. We map the room into a semantic chessboard, which we call Room to Chessboard (R2C). Each grid represents the position and size of objects inside the room. We find that chessboard is \textbf{succinct} enough for LLMs to conduct semantic searches with global view of the room. Also, the chessboard is \textbf{informative} enough to convey detailed environmental state for LLMs to predict executable low-level actions. Additionally, we enhance decision-making through a Chain-of-Thought (CoT) paradigm, improving LLMs' interpretability and action reasoning. We implement R2C using both fine-tuned open-source LLMs and closed-source models like GPT-4, and demonstrate its efficacy on the challenging ALFRED benchmark. Our results show that with communication based on chessboard, LLMs can serve as effective low-level action planners, and can generalizes well to open-vocabulary robotic planning tasks. View the demos on our project page: https://anonymous4cv.github.io/Room2Chessboard.

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4358

Loading