Keywords: language language model based agent, agent memory, memory selection, reinforcement learning
Abstract: Reference-augmented inference has emerged as an effective form of test-time scaling for large language models (LLMs), where selected referred demonstrations or referred cases help adapt reasoning to a given query. However, existing reference selection methods mainly rely on heuristic retrieval, independent scoring, or expensive LLM-based reranking, and therefore do not explicitly model how multiple references should be jointly composed and ordered under a limited context budget. We propose \textbf{RefGen}, a lightweight and modular framework that formulates reference selection as an autoregressive index generation problem over a retrieved candidate pool. Instead of generating new textual demonstrations, RefGen uses a compact Transformer encoder--decoder to produce an ordered sequence of candidate indices, enabling query-aware composition of referred demonstrations for downstream reasoning. To optimize this discrete selection policy, we combine supervised fine-tuning with reinforcement learning using verifiable rewards. Experiments on mathematical reasoning, scientific question answering, and visual question answering benchmarks show that RefGen consistently outperforms retrieval-based and learning-based baselines across both LLM and VLM backbones. RefGen is plug-and-play for frozen foundation models and introduces only minimal inference overhead, making it practical for real-world deployment.
Submission Type: Discovery
Copyright Form: pdf
Submission Number: 436
Loading