Keywords: Large Language Models, Tensor Networks, Low-rank Representation
Abstract: Modern language models represent text using discrete token-level embeddings, which forces recurring multi-token
patterns to be learned implicitly across Transformer layers. Both Over-tokenized Transformers and Engram attempt
to address this limitation by explicitly incorporating multi-token (n-gram) memories. However, they rely on separate hash tables for each n-gram order, which introduces hash collisions and prevents nested n-grams from sharing the underlying latent structures. To address these issues, we propose Tensorized Engram (TN-gram), a compact memory module that represents tensorized $n$-gram embeddings through shared factors in the Canonical Polyadic (CP) form. TN-gram learns shared token-position factors together with order-absorption vectors to encode the embeddings of different $n$-gram order. Comprehensive experiments demonstrate that TN-gram matches or even outperforms Engram-style $n$-gram modules while requiring much fewer parameters.
Submission Number: 70
Loading