Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Haozhen Zhang; Haodong Yue; Tao Feng; Quanyu Long; Jianzhu Bao; Bowen Jin; Weizhi Zhang; Xiao Li; Jiaxuan You; Chengwei Qin; Wenya Wang

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, Wenya Wang

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: BudgetMem enables query-aware, budget-tiered memory processing via an RL-trained router, improving accuracy–cost trade-offs across benchmarks.

Abstract: Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present **BudgetMem**, a runtime agent memory framework for explicit, query-aware performance–cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy–cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes. Code is available at https://github.com/ViktorAxelsen/BudgetMem

Lay Summary: AI assistants often need to remember information from long conversations, documents, or past interactions in order to answer future questions. However, many current memory systems process and store information in a fixed way before knowing what the user will ask. This can waste resources, and it can also miss details that later become important for a specific question. This paper introduces BudgetMem, a method that lets an AI assistant decide how much effort to spend on memory processing when a new question arrives. Instead of using the same costly memory procedure for every question, BudgetMem breaks memory use into several steps and allows each step to run at a low, medium, or high effort level. For simple questions, the system can choose cheaper steps; for harder questions, it can spend more effort to find and summarize useful information. We test BudgetMem on tasks involving long conversations and long documents. The results show that it can improve answer quality when more resources are available, while also reducing cost when tighter budgets are needed. Our analysis further shows how different ways of controlling effort affect the trade-off between answer quality, cost, and latency. Overall, BudgetMem makes memory-augmented AI assistants more controllable and practical for real-world settings where both accuracy and resource use matter.

Originally Submitted Supplementary Material: zip

Link To Code: https://github.com/ViktorAxelsen/BudgetMem

Primary Area: Deep Learning->Large Language Models

Keywords: Runtime agent memory, Performance–cost trade-off, Reinforcement Learning

Originally Submitted PDF: pdf

Submission Number: 34001

Loading