Frontier Learning: Training LLM Reasoners at the Edge of Capability

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Models, Reasoning, post-training
Abstract: Reinforcement Learning-based post-training of Large Language Models (LLM) has been successfully applied to improve their reasoning capabilities. Existing pipelines primarily finetune LLMs on a fixed pool of problems specified prior to training using the GRPO loss. This is fundamentally limiting, as learning signal arises only when policy rollouts mix successes and failures, causing the useful portion of any fixed pool to quickly become stale as the model improves. To address this, we propose frontier learning, an open-ended post-training approach in which procedural generators are used online to continually produce informative training problems. It treats the generator’s task-specific parameters as a search space and uses a novel regret signal to prioritize and explore frontier difficulty levels in order to focus training at the edge of the model’s evolving reasoning capabilities. Across several reasoning tasks, our approach consistently achieves higher relative gains over fixed-pool baselines, demonstrating that effective post-training requires not only selecting useful problems, but continually generating them at the edge of capability.
Submission Number: 189
Loading