Abstract: Many recent language models (LMs) of the Transformers family are capable of in-context learning (ICL), manifested in the LMs' ability to perform a new task solely from its description in a natural language input. Previous work curating these models assumes that ICL emerges from vast over-parametrization or the scale of multi-task training. However, a complementary branch of recent theoretical work attributes ICL emergence to specific properties of training data and creates functional in-context learners in small-scale, synthetic settings.
Inspired by these findings, we propose a Concept-aware Training (CoAT) method constructing training scenarios that make it beneficial for the LM to learn to utilize analogical reasoning concepts. We measure that data sampling of CoAT substantially improves models' ICL on unseen tasks, resulting in the performance comparable to the previous in-context learners trained on over 1600 tasks when we apply CoAT with only two QA datasets.
Our analyses show that CoAT's improvements can be attributed to models' reinforced ability to benefit from natural concepts from demonstrations over the reliance on the pre-trained semantic priors common for previous ICL models.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading