Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

ACL ARR 2024 December Submission876 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper observes the inner workings when LLMs encode inputs with typos to understand robustness against typos. We hypothesize that specific neurons in FFN layers and attention heads in multi-head attention layers recognize typos and internally recover them to capture the originally intended meaning. We introduce a method to identify the typo neurons and typo heads that work actively only when inputs contain typos. Through our experiments with Gemma 2, the following findings are obtained: 1) Neurons in the early and early middle layers strongly respond to typos . 2) Few heads capturing contextual information also contribute to recovering typos. 3) The difference in the model size results in the different proportions of typo-related workload for neurons and heads.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: robustness

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 876

Loading