ICL-Markup: Structuring In-Context Learning using Soft-Token Tags

Marc-Etienne Brunet; Ashton Anderson; Richard Zemel

ICL-Markup: Structuring In-Context Learning using Soft-Token Tags

Marc-Etienne Brunet, Ashton Anderson, Richard Zemel

Published: 01 Nov 2023, Last Modified: 12 Dec 2023R0-FoMo PosterEveryoneRevisionsBibTeX

Keywords: in-context learning, parameter-efficient fine-tuning, meta-learning, few-shot learning, out of scope detection

TL;DR: We structure ICL with markup-like tags that can be meta-learned, minimizing arbitrary decisions, and improving performance

Abstract: Large pretrained language models (PLMs) can be rapidly adapted to a wide variety of tasks via a text-to-text approach, where the instruction and input are fed to the model in natural language. Combined with in-context learning (ICL), this paradigm is impressively flexible and powerful. However, it also burdens engineers with an overwhelming amount of choices, many of them arbitrary. Inspired by markup languages like HTML, we contribute a method of using soft-token (a.k.a tunable token) tags to compose prompt templates. This approach reduces arbitrary decisions and streamlines the application of ICL. Our method is a form of meta-learning for ICL; it learns these tags in advance during a parameter-efficient fine-tuning ``warm-up'' process. The tags can subsequently be used in templates for ICL on new, unseen tasks without any additional fine-tuning. Our experiments with this approach yield promising initial results, improving PLM performance in important enterprise applications such as few-shot and open-world intent detection, as well as text classification in news and legal domains. Our method is a form of meta-learning for ICL; it learns these tags in advance during a parameter-efficient fine-tuning "warm-up" process. The tags can subsequently be used in templates for ICL on new, unseen tasks without any additional fine-tuning. Our experiments with this approach yield promising initial results. Improving PLM performance in important enterprise applications such as few-shot and open-world intent detection, as well as text classification in a legal domain.

Submission Number: 47

Loading