LILO: Learning to Reason at the Frontier of Learnability

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: llm reasoning, reasoning, reinforcement learning, curriculum learning, unsupervised environment design
TL;DR: Prove with theory and empirical results that prioritising training on questions with "medium" level of difficulty is beneficial for training reasoning models with RL
Abstract: Reinforcement learning is widely adopted in post-training large language models, especially for reasoning-style tasks such as maths questions. However, as we show, most existing methods will provably fail to learn from questions that are too hard, where the model always fails, or too easy, where the model always succeeds. Much human effort is therefore spent continually producing datasets of questions of a suitable difficulty for state-of-the-art models. Given this, we consider how to algorithmically identify questions that allow for maximally efficient training. We introduce a method, LILO (Learnability Improves LLMs Optimally), that prioritises training on questions with high variance of success, known as learnability, and we provide theory proving LILO maximises the expected improvement of the model. We run a wide range of experiments over multiple base models, algorithms and reasoning datasets to demonstrate that LILO consistently improves final test accuracy and can yield a 3x reduction in the number of training steps required to reach it. We explore how questions with high learnability can be efficiently identified, and discuss how learnability can be scaled to produce LLM agents that autonomously and open-endedly expand the frontier of human knowledge.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 20979
Loading