Distributed Specialization: How Transformers Process Rare Tokens Through Parameter Differentiation

Published: 22 Sept 2025, Last Modified: 03 Jan 2026WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, mechanistic interpretability
Submission Number: 154
Loading