Hardness Masking via Auto-Regressive Language ModelDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Pre-trained masked language models have achieved tremendous success in natural language processing. Most of these methods rely on recovering randomly masked tokens, which is in general not as good as when tokens are masked based on how well the model can predict. However, it is costly for a large-scale model to self-identify tokens that it still struggles to predict. On the other hand, we observe that a smaller language model can often effectively find what a large model fails to learn. Inspired by this observation, we propose to leverage a compact bi-directional auto-regressive language model to dynamically discover tokens that a large language model has not learned well and guide its training via hardness masking. Comprehensive experiments demonstrate that our masking method can effectively boost the performance of pre-trained language models on general language understanding benchmarks.
Paper Type: short
0 Replies

Loading