HELM: Steering Long-Horizon Agents with Learned Hierarchical Memory and Epistemic Governance

HELM: Steering Long-Horizon Agents with Learned Hierarchical Memory and Epistemic Governance

ACL ARR 2026 January Submission9591 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Agent, Memory

Abstract: Long-horizon LLM agents must carry state across many tool interactions, yet naïve context extension via periodic summarization or top-$k$ retrieval can discard decisive evidence and makes failures hard to audit. We introduce \textbf{HELM} (\textbf{H}ierarchical \textbf{E}pistemic \textbf{L}earned \textbf{M}emory), a framework that exposes memory as an explicit, event-driven interface and couples memory access with \emph{epistemic governance}. HELM instantiates a three-tier nested store, \textbf{SHNM}, that links episodic traces to consolidated recalls and thematic indices via provenance edges and epistemic metadata (timestamps, source types, tool status). Governance makes memory operations reproducible: retrieval is re-ranked with recency/status-aware scoring and conflict resolution prefers verified, newer evidence, while provenance expansion can trace any recall back to concrete tool spans. On top of SHNM, a learned controller decides when to read, write, consolidate, and prune under task and efficiency budgets, and a tool-aware embedding model indexes tool-augmented trajectories to improve retrieval of procedural and trace-based memories. We evaluate on five long-horizon benchmarks and report diagnostics that jointly measure end-task performance, memory efficiency, and epistemic reliability, including auditable recall metrics that quantify provenance faithfulness.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Agent,LLM,Memory

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 9591

Loading