From Reasoning to Generalization: Knowledge-Augmented LLMs for the ARC Benchmark

From Reasoning to Generalization: Knowledge-Augmented LLMs for the ARC Benchmark

ACL ARR 2026 January Submission185 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Abstraction and Reasoning Corpus (ARC), LLM reasoning and generalization, LLM-based Program synthesis, Knowledge augmentation in LLMs

Abstract: Despite extensive research on reasoning-oriented LLMs, core cognitive faculties of human intelligence, such as abstract reasoning and generalization, remain underexplored. To address this, we evaluate recent reasoning-oriented LLMs on the Abstraction and Reasoning Corpus (ARC) benchmark, which explicitly demands both faculties. We formulate ARC as a program synthesis task and propose nine candidate solvers. Experimental results show that repeated-sampling planning-aided code generation (RSPC) achieves the highest test accuracy and demonstrates consistent generalization across most LLMs. To further improve performance, we introduce Knowledge Augmentation for Abstract Reasoning (KAAR), which encodes core knowledge priors within an ontology that classifies priors into three hierarchical levels based on their dependencies. KAAR progressively expands LLM reasoning capacity by gradually augmenting priors at each level, and invokes RSPC to generate candidate solutions after each augmentation stage. Empirical results show that KAAR maintains strong generalization and consistently outperforms non-augmented RSPC across all evaluated LLMs, achieving around 5\% absolute gains and up to 64.52\% relative improvement.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: Language Modeling, Resources and Evaluation,

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English, Python

Submission Number: 185

Loading