Tensorizing Engram: Sharing Latents Across N-Gram Embeddings is Beneficial in LLMs

Tensorizing Engram: Sharing Latents Across N-Gram Embeddings is Beneficial in LLMs

07 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Tensor Networks, Low-rank Representation

Abstract: Modern language models represent text using discrete token-level embeddings, which forces recurring multi-token patterns to be learned implicitly across Transformer layers. Both Over-tokenized Transformers and Engram attempt to address this limitation by explicitly incorporating multi-token (n-gram) memories. However, they rely on separate hash tables for each n-gram order, which introduces hash collisions and prevents nested n-grams from sharing the underlying latent structures. To address these issues, we propose Tensorized Engram (TN-gram), a compact memory module that represents tensorized $n$-gram embeddings through shared factors in the Canonical Polyadic (CP) form. TN-gram learns shared token-position factors together with order-absorption vectors to encode the embeddings of different $n$-gram order. Comprehensive experiments demonstrate that TN-gram matches or even outperforms Engram-style $n$-gram modules while requiring much fewer parameters.

Submission Number: 70

Loading