On the Role of Attention in Prompt-tuningDownload PDF

Published: 04 Mar 2023, Last Modified: 16 May 2023ME-FoMo 2023 PosterReaders: Everyone
Keywords: fundamental limits, self-attention, context-relevant, prompt, fine-tuning
TL;DR: Prompt-Attention: An analysis of the effectiveness of prompt-tuning and of the role of attention in processing context-relevant information
Abstract: Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. We isolate the role of prompt-tuning through a self-contained prompt-attention model. Our contributions are as follows: (1) We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention under our contextual data model. (2) We analyze the initial trajectory of gradient descent and show that it learns the prompt and prediction head with near-optimal sample complexity and demonstrate how prompt can provably attend to sparse context-relevant tokens. We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.
0 Replies

Loading