L-PEM: A Lightweight Model for Parametric Experiential Memory

Xuancheng Li; Haitao Li; Yujia Zhou; Qingyao Ai; Yiqun LIU

L-PEM: A Lightweight Model for Parametric Experiential Memory

Xuancheng Li, Haitao Li, Yujia Zhou, Qingyao Ai, Yiqun LIU

20 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: learning from experience；reinforcement learning；Group Relative Preference Optimization

Abstract: LLMs excel across many tasks but typically lack the ability to accumulate and reuse prior experiences. As a result, they often reason from scratch, retracing known solution paths and repeating past mistakes. Existing work commonly relies on Retrieval-Augmented Generation (RAG) to retrieve experiential memory summarized by LLMs. However, this paradigm suffers from high latency and computational cost, utilizes memory based on relevance rather than utility, resulting in suboptimal outcomes. To address these issues, we propose \textbf{L-PEM} (A \textbf{L}ightweight model for \textbf{P}arametric \textbf{E}xperiential \textbf{M}emory), a novel approach that embeds experience into the parameters of a compact generative model. This architecture unifies memory generation and application in a single forward pass, effectively replacing the conventional store-and-retrieve paradigm. We train L-PEM with Group Relative Preference Optimization (GRPO) using rollouts from a frozen executor as feedback and evaluate it on multiple mathematical reasoning benchmarks. L-PEM delivers significant performance gains while maintaining low latency and computational cost. Extensive ablation and analysis further elucidate the mechanisms underlying L-PEM’s effectiveness. \footnote{We release out code at https://anonymous.4open.science/r/L-PEM}

Primary Area: reinforcement learning

Submission Number: 24169

Loading