CodeIt: Abstract Reasoning with Iterative Policy-Guided Program Synthesis

Natasha Butt; Blazej Manczak; Auke Wiggers; Corrado Rainone; David W. Zhang; Michaël Defferrard; Taco Cohen

CodeIt: Abstract Reasoning with Iterative Policy-Guided Program Synthesis

Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David W. Zhang, Michaël Defferrard, Taco Cohen

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Program synthesis, abstract reasoning, reinforcement learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose an iterative program synthesis procedure for the Abstract Reasoning Corpus benchmark.

Abstract: Artificial intelligence systems are increasingly solving tasks that are commonly believed to require human-like reasoning ability. However, learned approaches still fare poorly on the Abstraction and Reasoning Corpus (ARC), a benchmark that measures skill-acquisition efficiency as a proxy for intelligence. Each ARC task requires an agent to reason about a transformation between input and output pairs. In this work, we solve these tasks by identifying the program that applies this transformation. We propose CodeIt, a program synthesis approach that leverages a higher level of abstraction through a domain-specific language. CodeIt iterates between sampling from the current large language model policy and learning that policy using supervised learning. The sampling stage augments newfound programs using hindsight relabeling and program mutation, requiring no expert search procedure. We demonstrate CodeIt’s effectiveness on the ARC benchmark, where we show that learning to write code in iterations leads to intertask generalization, which results in state-of-the-art performance.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5606

Loading