Rare Word Representation Learning by Smoothing Over Word Classes

Anonymous

Rare Word Representation Learning by Smoothing Over Word Classes

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We develop an evaluation technique for quantifying the 'frequency bias' of a language model, and propose 'POS Smoothing', a method for removing this bias.

Abstract: Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence of this objective, language models tend to not generalize well to tokens rarely seen during training. Our work introduces a method for quantifying the frequency bias of a language model: the degree to which a language model is influenced by token frequency when determining the grammatical acceptability of sentences. We then present a method for pre-training a language model to remove the frequency bias by adjusting the objective function to distribute the learning signal to syntactically similar tokens, inducing a syntactic prior over the token embeddings. Our method, which we call POS Smoothing, results in better performance on infrequent tokens without degrading the model's general ability on downstream language understanding tasks.

Paper Type: short

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

0 Replies

Loading