Dynamic Causal‐Graph Memory: Structured Retrieval for Million–Token Reasoning

Published: 10 Jun 2025, Last Modified: 21 Jun 2025LCFM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: long-context language models, retrieval-augmented generation, graph neural networks, memory efficiency, multi-hop reasoning
TL;DR: DCGM turns an LLM’s retrieval buffer into an $O(N\log N)$ sparse causal graph, yielding +8 F1 and no extra memory on a 1 M-token multi-hop QA benchmark.
Abstract: Existing long–context LLMs still treat retrieved chunks as an unstructured bag, leaving multi-hop reasoning both memory-hungry and error-prone. We present \textit{Dynamic Causal-Graph Memory} (DCGM), a drop-in module that converts the retrieval buffer into a streaming graph whose edges are the decoder’s own attention-derived causal scores. A single-pass $O(N\log N)$ algorithm maintains a $B{\,{+}}\,BM$-sized subgraph, and a lightweight message-passing layer feeds a pooled “causal memory’’ back into the LLM. We show tight FLOP/memory bounds and show that sparsification introduces at most $\mathcal{O}(\delta L)$ error. On \emph{LongHopQA}, a new million-token multi-hop benchmark, DCGM lifts F1 by {+8.3} over the best KV-cache compressor while matching its 38 GB peak memory. These results demonstrate that explicit causal structure—rather than larger windows alone—is key to efficient long-context reasoning.
Submission Number: 47
Loading