Number Embeddings of Pre-trained LMs are Remarkably Accurate

Number Embeddings of Pre-trained LMs are Remarkably Accurate

ACL ARR 2025 May Submission6499 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: While language models show excellent capacity to model coherent text, it is commonly believed that their limitations reside in tasks requiring exact representations, such as numeric values. This work shows that representations of numbers that encode their nominal numeric values naturally emerge in text-only causal language models. Contrary to previous work assuming linearity of models' representations, we find that different pre-trained models consistently learn highly precise sinusoidal representations already within the input embedding, and can be accurately decoded with an appropriate probing method. These findings undermine existing assumptions about the inherent inability of language models to represent numeric information accurately and, consequently, point to the real limitation of robust arithmetic proficiency in language models in their limited capacity to combine accurate input representations.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: probing, model editing, robustness

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 6499

Loading