Bidirectional Language Models Are Also Few-shot LearnersDownload PDF

Anonymous

17 Apr 2022 (modified: 05 May 2023)ACL ARR 2022 April Blind SubmissionReaders: Everyone
Abstract: Large language models such as GPT-3 (Brown et al., 2020) can perform certain tasks without undergoing fine-tuning after seeing only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. Bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations. Prompting bidirectional models has long been desired, but their pre-training objectives have made them incompatible with the prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 (Xue et al., 2021) model with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021) with approximately 50% fewer parameters. We further show SAP extends its effectiveness to the tasks of question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than a property of only unidirectional models.
Paper Type: long
0 Replies

Loading