Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

ACL ARR 2024 December Submission876 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

This paper observes the inner workings when LLMs encode inputs with typos to understand robustness against typos. We hypothesize that specific neurons in FFN layers and attention heads in multi-head attention layers recognize typos and internally recover them to capture the originally intended meaning. We introduce a method to identify the typo neurons and typo heads that work actively only when inputs contain typos. Through our experiments with Gemma 2, the following findings are obtained: 1) Neurons in the early and early middle layers strongly respond to typos . 2) Few heads capturing contextual information also contribute to recovering typos. 3) The difference in the model size results in the different proportions of typo-related workload for neurons and heads.

Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: robustness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 876
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview