DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference

Shikhar Murty, Tatsunori B. Hashimoto, Christopher D. Manning

24 Oct 2020 (modified: 24 Oct 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Keywords: Metalearning, NLP, Natural Language Inference, Few-shot learning, Meta-overfitting, Task augmentation

TL;DR: We propose a task augmentation strategy to prevent overfitting in meta-learning applied to NLP problems.

Abstract: Meta-learning promises "``few-shot" learners that can adapt to new distributions by repurposing knowledge acquired from previous training. However, meta-learning has thus far failed to achieve this in NLP due to the lack of a well-defined task distribution, leading to alternatives that treat datasets as tasks. Such an ad hoc task distribution has two negative consequences. The first one is due to a lack of quantity---since there's only a handful of datasets, meta-learners tend to overfit their adaptation mechanism. The second one is due to a lack of quality---since NLP datasets are highly heterogenous, many learning episodes have poor transfer between their support and query sets, which dis-incentivizes the meta-learner from adapting. To alleviate these issues, we propose DReCa (Decomposing datasets into Reasoning Categories), a simple method for discovering and using latent reasoning categories in a dataset, to form additional high quality tasks. DReCA works by splitting examples into label groups, embedding them with a fine-tuned BERT model and then clustering each group into reasoning categories. Across 4 NLI fewshot problems, we demonstrate that using DReCA improves the performance of meta-learners by 1.5--4 accuracy points.

0 Replies