Keywords: universal approximation, transformer, prompting, prefix-tuning, theory
TL;DR: We show that prefix-tuning a single attention head can be a universal approximator for functions defined on the hypersphere. Then we use that to show that a prefix-tuned transformer with linear depth can approximate any sequence-to-sequence function.
Abstract: Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited.
A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it, i.e., whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions.
This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed.
In fact, prefix-tuning a single attention head being sufficient to approximate any continuous function.
Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length.
Beyond these density-type results, we also offer bounds on the length of the prefix needed to approximate a function to a desired precision.
Submission Number: 25
Loading