Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agent Learning, Procedural Memory, Bayesian Decision Making, Hierarchical Planning, LLM Agents
TL;DR: A memory-augmented agent architecture that learns hierarchical procedural knowledge through Bayesian selection and contrastive refinement.
Abstract: We present \textbf{MACLA}, a framework that decouples reasoning from learning by maintaining a frozen large language model (LLM) while performing all adaptation in an external hierarchical procedural memory. MACLA extracts reusable procedures from trajectories, tracks reliability via Bayesian posteriors, selects actions through expected-utility scoring, and refines procedures by contrasting successes vs. failures. Across four benchmarks (ALFWorld, WebShop, TravelPlanner, InterCodeSQL), MACLA achieves \textbf{78.1\% average performance}, outperforming all baselines. On ALFWorld unseen tasks, MACLA reaches \textbf{90.3\%} with \textbf{+3.1\% positive generalization}. The system constructs memory in \textbf{56 seconds} (2,800× faster than the state-of-the-art LLM parameter-training baseline), compresses \textbf{2,851 trajectories into 187 procedures} (15:1). Experimental results demonstrate that structured external memory with Bayesian selection and constrastive refinement enable sample-efficient, interpretable and continually improving agents without LLM parameter updates.
Area: Innovative Applications (IA)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 936
Loading