Keywords: Agent, Large Language Model, Reinforcement Learning
Abstract: Agents powered by large language models (LLMs) have demonstrated remarkable progress in solving complex reasoning tasks. However, LLM agents often falter on long-horizon tasks due to cognitive overload, as their working memory becomes cluttered with expanding and irrelevant information, which dilutes their attention and hinders effective planning and reasoning. To mitigate this challenge, we introduce **CO**gnitive **R**esource Self-**AL**location (**CORAL**), a novel reasoning paradigm that empowers agents to proactively optimize their context. Implemented as an agent-callable working memory management toolset, CORAL allows an agent to maintain crucial checkpoints of its progress within its working memory and adaptively initiate a new problem-solving episode by purging cluttered working memory and resuming its reasoning from the most recent checkpoint, effectively reallocating agentic cognitive resources by implicitly sharpening their attention on the checkpoints. We further enhance the agent's checkpoint capabilities using a Multi-episode Agentic Reinforced Policy Optimization algorithm. On several long-horizon task benchmarks, CORAL significantly outperforms standard LLM agent methods. Notably, analysis of the LLMs' attention distribution reveals that CORAL substantially optimizes agentic RL dynamics, which in turn ensures agents maintain a focused cognitive resource allocation, thereby continuously amplifying performance gains.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16755
Loading