Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Anonymous

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: Standard pretrained language models operateon sequences of subword tokens without direct access to the characters that compose eachtoken’s string representation. We probe theembedding layer of pretrained language models and show that models learn the internalcharacter composition of whole word and subword tokens to a surprising extent, withoutever seeing the characters coupled with the tokens. Our results show that the embeddinglayer of RoBERTa holds enough informationto accurately spell up to a third of the vocabulary and reach high average character ngramoverlap on all token types. We further testwhether enriching subword models with additional character information can improve language modeling, and observe that this methodhas a near-identical learning curve as training without spelling-based enrichment. Overall, our results suggest that language modelingobjectives incentivize the model to implicitlylearn some notion of spelling, and that explicitly teaching the model how to spell does notenhance its performance on such tasks.

0 Replies

Loading