Sketch-Plan-Generalize: Learning Inductive Representations for Grounded Spatial Concepts

Namasivayam Kalithasan; Sachit Sachdeva; Himanshu Gaurav Singh; Vishal Bindal; Arnav Tuli; Gurarmaan Singh Panjeta; Divyanshu Agarwal; Rohan Paul; Parag Singla

Sketch-Plan-Generalize: Learning Inductive Representations for Grounded Spatial Concepts

Namasivayam Kalithasan, Sachit Sachdeva, Himanshu Gaurav Singh, Vishal Bindal, Arnav Tuli, Gurarmaan Singh Panjeta, Divyanshu Agarwal, Rohan Paul, Parag Singla

27 Sept 2024 (modified: 12 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neuro-symbolic AI, Concept learning, Robotics

TL;DR: Learning inductive spatial concepts such as staircases, towers, rows, etc. as grounded executable programs for an embodied agent is aided by factoring the problem as sketch generation, physical-reward guided search, and programmatic abstraction.

Abstract: Our goal is to enable embodied agents to learn inductive representations for grounded spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given few human demonstrations, we seek a learning architecture that infers a succinct inductive *program* representation that *explains* the observed instances. The approach should generalize to learning novel structures of different sizes or complexity expressed as a hierarchical composition of previously learned concepts. Existing approaches that use code generation capabilities of pre-trained large (visual) language models, as well as purely neural models, show poor generalization to *a-priori* unseen complex concepts. Our key insight is to factor inductive concept learning as: (i) *Sketch:* detecting and inferring a coarse signature of a new concept (ii) *Plan:* performing MCTS search over grounded action sequences (iii) *Generalize:* abstracting out grounded plans as inductive programs. Our pipeline facilitates generalization and modular re-use enabling continual concept learning. Our approach combines the benefits of code generation ability of large language models (LLMs) along with grounded neural representations, resulting in neuro-symbolic programs that show stronger inductive generalization on the task of constructing complex structures vis-'a-vis LLM-only and purely neural approaches. Further, we demonstrate reasoning and planning capabilities with learned concepts for embodied instruction following.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10039

Loading