Concept-aware Data Construction Improves In-context Learning of Language Models

ICLR 2024 Workshop ME-FoMo Submission24 Authors

Published: 04 Mar 2024, Last Modified: 06 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: in-context learning, few-shot learning, reasoning, concepts
TL;DR: Inspired by recent theories, we propose to train on data where it is beneficial for the LM to capture the reasoning concepts and show that concept-aware data selection can improve qualities of in-context learners..
Abstract: Many recent language models (LMs) of the Transformers family are capable of *in-context learning* (ICL), manifested in the LMs' ability to perform a new task solely from its description in a natural language input. Previous work curating these models assumes that ICL emerges from vast over-parametrization or the scale of multi-task training, but recent theoretical work attributes ICL emergence to training data properties, creating in-context learners with small, synthetic data. Inspired by these findings, we propose *Concept-aware Training* (CoAT), a framework for constructing training scenarios that make it beneficial for the LM to learn to utilize the *analogical reasoning concepts* from demonstrations. We find that by using CoAT, pre-trained transformers *can* learn to better utilise new latent concepts from demonstrations and that such ability makes ICL more robust to functional deficiencies of the previous models. Finally, we show that concept-aware in-context learning improves ICL performance on a majority of new tasks compared to traditional instruction tuning, reaching performance comparable to the multitask learners using magnitudes of more training data.
Submission Number: 24
Loading