Keywords: few-shot learning, in-context learning, fusion-in-decoder
Abstract: Large pre-trained models are capable of few-shot in-context learning (ICL), i.e., performing a new language task by prepending a few demonstrations of the task before the test input. However, ICL inference can be expensive as the concatenated demonstrations induce additional computation. Inspired by how fusion-in-decoder (FiD) models efficiently aggregate a large number of passages to answer open-domain questions, we hypothesize that fusion methods other than simple concatenation may be applied to improve ICL. We conduct extensive experiments with three fusion methods, concatenation-based (early fusion), FiD (intermediate), and ensemble-based (late), in a meta-learning setting. Empirical results show that FiD performs favorably while being up to 10x more efficient compared to the other fusion methods. Notably, FiD ICL performance is comparable with few-shot fine-tuning when model size is scaled to 3B for both methods.
TL;DR: Fusion-in-decoder ICL outperforms concatenation-based ICL and ensemble-based ICL while being more computationally efficient.