Abstract: Learning token embeddings based on token co-occurrence statistics has proven effective for both pre-training and fine-tuning in natural language processing.
However, recent studies have pointed out the distribution of learned embeddings degenerates into anisotropy, and even pre-trained language models (PLMs) suffer from a loss of semantics-related information in embeddings for low-frequency tokens.
This study first analyzes fine-tuning dynamics of a PLM, BART-large, and demonstrates its robustness against degeneration.
On the basis of this finding, we propose DefinitionEMB, a method that utilizes definitions to construct isotropically distributed and semantics-related token embeddings for PLMs while maintaining original robustness \textcolor{black}{during fine-tuning}.
Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to construct such embeddings for RoBERTa-base and BART-large.
Furthermore, the constructed embeddings for low-frequency tokens improve the performance of these models across various GLUE and four text summarization datasets.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: word embeddings, representation learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 128
Loading