Keywords: compute and memory efficient Transformer, long-text Transformer, structured data
TL;DR: H^2MT is a hierarchy-aware memory layer that routes retrieval along document trees, improving quality and time to first token.
Abstract: Transformer-based large language models (LLMs) are used in language processing, yet when handling long context most often restrict the context window. Furthermore, many existing solutions are inefficient and overlook the structure inherent to documents. As a result, long-context models often treat text as a flat token stream, which obscures hierarchy and wastes computation by processing both relevant and irrelevant context. We present **Hierarchical Semantic Memory Transformer (H2MT)**, a semantic hierarchy-aware approach that attaches to a backbone model. H2MT represents a document as a tree and performs level-conditioned routing and aggregation. It first propagates memory embeddings (summary vectors produced by the backbone) upward. Thus, child-node memory embeddings are injected into their ancestors to preserve relative context. Finally, the model applies cross-level attention to retrieve related information. We evaluate on Qasper (document QA) and BookSum (hierarchical summarization), and illustrate applicability to technical manuals. H2MT improves quality at similar model size while reducing long-range attention compute and memory. The approach is most helpful for data with a semantic hierarchy that can be modeled as a tree. It uses less memory and fewer parameters.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 23759
Loading