Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

ICLR 2026 Conference Submission25331 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, test-time training, reinforcement learning, curriculum learning
TL;DR: We propose a test-time curriculum agent that self-curates a sequence of training tasks to specialize towards a specific target task via reinforcement learning
Abstract: Humans are good at learning on the job: We learn how to solve the tasks we face as we go along. Can a model do the same? We propose an agent that assembles a task-specific curriculum, called *test-time curriculum* (TTC-RL), and applies reinforcement learning to continue training the model for its target task. The test-time curriculum avoids time-consuming human curation of datasets by automatically selecting the most task-relevant data from a large pool of available training data. Our experiments demonstrate that reinforcement learning on a test-time curriculum consistently improves the model on its target tasks, across a variety of evaluations and models. Notably, on challenging math and coding benchmarks, TTC-RL improves the pass@1 of `Qwen3-8B` by approximately 80% on AIME25 and 135% on Codeforces. Moreover, we find that TTC-RL significantly raises the performance ceiling compared to the initial model, increasing pass@64 on AIME25 from 57% to 79% and on Codeforces from 45% to 72%. Our findings show the potential of test-time curricula in extending the test-time scaling paradigm to continual *training* on thousands of task-relevant experiences during test-time.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 25331
Loading