Prompting a Pretrained Transformer Can Be a Universal Approximator

ICLR 2024 Workshop ME-FoMo Submission25 Authors

Published: 04 Mar 2024, Last Modified: 14 Apr 2024ME-FoMo 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: universal approximation, transformer, prompting, prefix-tuning, theory
TL;DR: We show that prefix-tuning a single attention head can be a universal approximator for functions defined on the hypersphere. Then we use that to show that a prefix-tuned transformer with linear depth can approximate any sequence-to-sequence function.
Abstract: Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it, i.e., whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, prefix-tuning a single attention head being sufficient to approximate any continuous function. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer bounds on the length of the prefix needed to approximate a function to a desired precision.
Submission Number: 25
Loading