Reconsidering Degeneration of Token Embeddings with Definitions

Reconsidering Degeneration of Token Embeddings with Definitions

TMLR Paper7980 Authors

18 Mar 2026 (modified: 10 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: While learning token embeddings via language modeling and weight tying remains the dominant paradigm, embeddings often degenerate into anisotropic (i.e., non-uniform) distributions in geometry space, limiting their expressiveness. This study first analyzes the fine-tuning dynamics of encoder-based pretrained language models (PLMs) and shows that their embeddings can largely preserve their geometric structure to defend against degeneration during fine-tuning. However, pretrained embeddings still suffer from anisotropic distribution, and low-frequency tokens tend to lose their semantics. To address this issue, we propose DefinitionEMB, a method that leverages lexical definitions to inject explicit semantics into embeddings while anchoring them to the pretrained geometric manifold to preserve PLMs' established geometric knowledge. Extensive experiments demonstrate the effectiveness of leveraging Wiktionary definitions on four PLMs: RoBERTa-base, BART-large, T5-large, and T5Gemma-l-ul2 across natural language understanding and abstractive text summarization.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Zhen_Fang2

Submission Number: 7980

Loading