Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs
Keywords: Learning from Experience;Reinforcement Learning;Group Relative Preference Optimization
Abstract: Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce \textbf{SEAM} (\textbf{S}tructured \textbf{E}xperience \textbf{A}dapter \textbf{M}odule), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and can be further improved with logged-success SFT after deployment. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablation and analysis further elucidate the mechanisms underlying SEAM’s effectiveness and robustness.\footnote{We release our code at \url{https://anonymous.4open.science/r/SEAM}.}
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents; multi-agent systems; agent memory; reinforcement learning in agents
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches to low-resource settings
Languages Studied: English
Submission Number: 7456
Loading