Learning What to Learn: Curriculum Curation for Test-Time Agent Learning
Keywords: Test-Time Learning, Self-Evolving Agents, Curriculum Learning
Abstract: Test-time learning enables large language model (LLM) agents to adapt during inference without costly retraining, yet prior work largely treats test-time experience as equally useful. We ask a simple question: *what data should agents learn from at test time?* Focusing on task selection and ordering for context-based adaptation, we hypothesize that redundant or overly simple examples offer diminishing returns, while curated curricula improve sample efficiency. Using the Agentic Context Engineering (ACE) framework, we evaluate on the AppWorld benchmark featuring tool-use and coding agents. We show that careful data selection can match full-dataset performance using only $\sim$30\% of training tasks, and that task ordering measurably affects learning outcomes. Our results position curriculum curation as a first-class design dimension for efficient test-time agent learning and practical deployment.
Submission Number: 55
Loading