MINDGRAPH: Faithful Concept-Graph Memory for Long-Context Reasoning

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language model, Memorization, Reasoning
Abstract: Large language models (LLMs) struggle with long-context reasoning not because of limited context windows alone, but because they lack a working memory of accumulated knowledge that is at once persistent, structured, and reasoning-friendly. Prior approaches---long-context prompting, retrieval-augmented generation, summary-based generation, multi-agent memory, and recursive REPL agents---each fail at least one of these properties: they rebuild state from raw tokens on every query, compress it into unstructured summaries that drop contradictions and relationships, or impose structures that do not directly support multi-step inference. We isolate the regime where these failures compound---long inputs that demand both broad input understanding and deep multi-step inference---through LongReason-200, a $200$-item benchmark filtered from LongBench-v2, NarrativeQA, and LooGLE. We then propose MindGraph, a concept-graph memory framework: an incremental, question-aware build stage maintains a compact graph with character-level provenance to the original input, and an answer stage reasons over it via two tools that combine fine-grained verification and hierarchical navigation, choosing its access pattern from the graph's size. On LongReason-200, MindGraph reaches $77\%$ accuracy---above all five baselines---while passing only ${\sim}7\%$ of the original input to the answering LLM per query. The result suggests that effective long-context reasoning requires architectures that treat the working memory itself as a first-class design dimension.
Submission Number: 91
Loading