Abstract: Homonymy can easily raise lexical ambiguity due to the misunderstanding of its multiple senses. Correct recognition of homonym sense greatly relies on its surrounding context. This ambiguous nature makes homonyms an appropriate testbed for examining the contextualization capability of pre-trained (PLM) and large language models (LLMs). Considering the impact of part-of-speech (POS) on homonym disambiguation and the dominance of English in word embedding research, this study provides a comprehensive layer-wise analysis of homonym representations in both English and Chinese, spanning same and different POS categories, across four families of PLMs/LLMs (BERT, GPT-2, Llama 3, Qwen 2.5). Through the creation of a synthetic dataset and computation of disambiguation score (\textit{D-Score}), we found that: (1) no universal layer depth was found which excels in differentiating homonym representations; (2) bidirectional models produce better contextualized homonym representations compared to much larger autoregressive models; (3) most importantly, POS affects homonym representations in models in ways that slightly differ from human research findings. The individual differences between LLMs uncovered in our study challenge the simplistic understanding of their inner workings. This reveals a compelling research frontier: conducting controlled experiments with purposefully manipulated inputs to enhance the interpretability of LLMs. We have made our dataset and codes available publicly at https://anonymous.4open.science/r/ehril/.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: word embeddings, lexical resources
Contribution Types: Model analysis & interpretability
Languages Studied: English, Chinese
Submission Number: 7571
Loading