Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs

Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs

ACL ARR 2026 January Submission7456 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learning from Experience；Reinforcement Learning；Group Relative Preference Optimization

Abstract: Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce \textbf{SEAM} (\textbf{S}tructured \textbf{E}xperience \textbf{A}dapter \textbf{M}odule), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and can be further improved with logged-success SFT after deployment. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablation and analysis further elucidate the mechanisms underlying SEAM’s effectiveness and robustness.\footnote{We release our code at \url{https://anonymous.4open.science/r/SEAM}.}

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: LLM agents; multi-agent systems; agent memory; reinforcement learning in agents

Contribution Types: Model analysis & interpretability, Reproduction study, Approaches to low-resource settings

Languages Studied: English

Submission Number: 7456

Loading