Keywords: language models, reasoning models, neurosymbolic models, few-shot learning
TL;DR: Code-enabled language models can outperform reasoning models
Abstract: Reasoning models (RMs), language models (LMs) trained with reinforcement learning to produce long-form natural language reasoning, have been remarkably successful, but they still cost large amounts of compute and data to train and can be slow and expensive to run.
In this paper, we show that ordinary LMs can already be elicited to be strong reasoners at a level comparable to or even surpass their corresponding RMs (e.g., DeepSeek V3 vs R1) without finetuning, across diverse domains from instruction following and creative generation to mathematical reasoning. This is achieved by combining the CodeAct approach, where LMs interleave natural language reasoning with code executions in a multi-step fashion, with few-shot bootstrap in-context learning---from as few as five training problems.
Analyzing four matched pairs of LMs and RMs, we find that our framework, coined *CodeAdapt*, enables three LMs to outperform the corresponding RMs on average over eight tasks (up to 22.9\%) while being 10-81\% more token efficient, and delivers superior performance for six tasks on average over models (up to 35.7\%). The code-augmented reasoning traces further display rich and varied problem-solving strategies. Our findings support that (1) CodeAdapt-style learning and reasoning may be domain general and robust and (2) code-enabled LMs are cognitively relevant and powerful systems, potentially providing a strong foundation for in-weight reinforcement learning.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 20494
Loading