This paper observes the inner workings when LLMs encode inputs with typos to understand robustness against typos. We hypothesize that specific neurons in FFN layers and attention heads in multi-head attention layers recognize typos and internally recover them to capture the originally intended meaning. We introduce a method to identify the typo neurons and typo heads that work actively only when inputs contain typos. Through our experiments with Gemma 2, the following findings are obtained: 1) Neurons in the early and early middle layers strongly respond to typos . 2) Few heads capturing contextual information also contribute to recovering typos. 3) The difference in the model size results in the different proportions of typo-related workload for neurons and heads.
Abstract:
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: robustness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 876
Loading