TL;DR: We develop an evaluation technique for quantifying the 'frequency bias' of a language model, and propose 'POS Smoothing', a method for removing this bias.
Abstract: Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence of this objective, language models tend to not generalize well to tokens rarely seen during training. Our work introduces a method for quantifying the frequency bias of a language model: the degree to which a language model is influenced by token frequency when determining the grammatical acceptability of sentences. We then present a method for pre-training a language model to remove the frequency bias by adjusting the objective function to distribute the learning signal to syntactically similar tokens, inducing a syntactic prior over the token embeddings. Our method, which we call POS Smoothing, results in better performance on infrequent tokens without degrading the model's general ability on downstream language understanding tasks.
Paper Type: short
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
0 Replies
Loading