Learning to Discover Abstractions for LLM Reasoning

Yuxiao Qu; Anikait Singh; Yoonho Lee; Amrith Setlur; Ruslan Salakhutdinov; Chelsea Finn; Aviral Kumar

Learning to Discover Abstractions for LLM Reasoning

Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo III SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning abstractions; LLM; RL; Structured exploration; Reasoning

TL;DR: A two-agent training framework for generating and applying reasoning abstractions to solve complex problems.

Abstract: Effective reasoning often requires going beyond pattern matching or memorization of solutions to identify and implement ''algorithmic procedures'' that can be used to deduce answers to hard problems. These algorithmic procedures consist of reusable primitives, intermediate results, or procedures that themselves can be applied across many problems. While current methods of RL post-training on long chains of thought ultimately desire to uncover this kind of algorithmic behavior, their sensitivity to benchmarks and the brittle and locally optimal nature of strategies learned by these systems suggest that this is far from a fulfilled promise. To instantiate this, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward successful reasoning strategies. We train models to be capable of proposing several useful abstractions given a problem, followed by RL training that incentivizes building a solution while using the information provided by these abstractions. This results in a two-agent cooperative RL training paradigm, RL through Abstraction Discovery (RLAD), that jointly trains an abstraction generator and an abstraction-conditioned solution generator. This bi-level setup effectively enables structured exploration, decouples learning signals pertaining to abstraction proposal and solution generation, and improves generalization to harder problems, analogous to what we would expect from hierarchical RL. Empirically, RLAD improves performance on challenging math benchmarks.

Submission Number: 74

Loading