A Lens into Interpretable Transformer Mistakes via Semantic Dependency

Ruo-Jing Dong; Yu Yao; Bo Han; Tongliang Liu

A Lens into Interpretable Transformer Mistakes via Semantic Dependency

Ruo-Jing Dong, Yu Yao, Bo Han, Tongliang Liu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Semantic Dependency refers to the relationship between words in a sentence where the meaning of one word depends on another, which is important for natural language understanding. In this paper, we investigate the role of semantic dependencies in answering questions for transformer models, which is achieved by analyzing how token values shift in response to changes in semantics. Through extensive experiments on models including the BERT series, GPT, and LLaMA, we uncover the following key findings: 1). Most tokens primarily retain their original semantic information even as they propagate through multiple layers. 2). Models can encode truthful semantic dependencies in tokens in the final layer. 3). Mistakes in model answers often stem from specific tokens encoded with incorrect semantic dependencies. Furthermore, we found that addressing the incorrectness by directly adjusting parameters is challenging because the same parameters can encode both correct and incorrect semantic dependencies depending on the context. Our findings provide insights into the causes of incorrect information generation in transformers and help the future development of robust and reliable models.

Lay Summary: Semantic dependency is vital for computers to understand language. It refers to the relationship between words in a sentence, where the meaning of one word depends on another. In this study, we explore how popular language models (like BERT, GPT, and LLaMA) understand these dependencies and why they sometimes make mistakes. We discovered that even after processing through many layers, most final-layer tokens still retain their original semantic meanings. Moreover, in the final step, the models do a good job of capturing real and truthful word relationships. However, when these models make mistakes, it is often because some words are encoded with incorrect dependencies. Fixing these mistakes is not as simple as pruning some parameters, because the same parameters can encode both correct and incorrect dependencies depending on the context. Our findings provide insights into the causes of incorrect information generation in models and can help future researchers develop more accurate and reliable AI systems.

Primary Area: Social Aspects->Accountability, Transparency, and Interpretability

Keywords: token-level semantic depdendency, transformer

Submission Number: 5804

Loading