Data-Efficient Finetuning Using Cross-Task Nearest NeighborsDownload PDF

Published: 01 Feb 2023, Last Modified: 17 Sept 2023Submitted to ICLR 2023Readers: Everyone
Keywords: multitasking, retrieval, few-shot, efficiency, nlp
TL;DR: We use unlabelled task-specific data to select subsets of massive multitask datasets and show that language models fine-tuned on these subsets outperform models trained on all available data for unseen tasks in zero and few-shot settings.
Abstract: Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on all available data on a variety of datasets. We assume access to a small number (250-1000) of unlabeled target task instances, select their nearest neighbors from a pool of multitask data, and use the retrieved data to train target task specific models. Our method is more data-efficient than training a single multitask model, while still outperforming it by large margins. We evaluate across a diverse set of tasks not in the multitask pool we retrieve from, including those used to evaluate T0 and in addition, more complex tasks including legal and scientific document QA. We retrieve small subsets of P3 (the collection of prompted datasets from which T0’s training data was sampled) and finetune T5 models that outperform the 3-billion parameter variant of T0 (T0-3B) by 8-30% on 11 out of 12 evaluation datasets while using at most 2% of the data used to train T0-3B. These models also provide a better initialization than T0-3B for few-shot finetuning on target-task data, as shown by a 3-23% relative improvement over few-shot finetuned T0-3B models on 8 datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](
19 Replies