Keywords: in-context learning, masked language modeling, bert, language modeling, evaluation, inference
TL;DR: This paper explores the in-context learning capabilities of masked language models, challenging the common view that such abilities are only present in causal language models.
Abstract: While in-context learning is commonly associated with causal language models, such as GPT, we demonstrate that this capability also 'emerges' in masked language models. Through an embarrassingly simple inference technique, we enable an existing masked model, DeBERTa, to perform generative tasks without additional training or architectural changes. Our evaluation reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. These complementary strengths suggest that the field's focus on causal models for in-context learning may be limiting – both architectures can develop these capabilities, but with distinct advantages; pointing toward promising hybrid approaches that combine the strengths of both objectives.
Primary Area: Natural language processing
Submission Number: 20723
Loading