Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors
Track: Theory
Keywords: Prompt regularization, Data-dependent generalization
TL;DR: Prompt optimization generalize even in the low data regime, as evidenced by non-vacuous generalization bounds that are data-dependent.
Abstract: Many practical prompt optimization techniques have been successful, even when exploring a large prompt space with with small amount of task-specific data. Recent work has partially explained this success by showing generalization bounds by applying PAC-Bayes theory to the discrete prompt space, but they are non-vacuous only in data-rich scenarios. We argue that such widespread success is achieved via implicit use of data-dependent perplexity, which acts as an effective prior and steers the optimization towards prompts that are more ``natural'' and generalize better. To justify this with theory, we derive novel generalization bounds that are non-vacuous for data-scarce prompt optimization via more useful priors, formally analyzing how perplexity regularization tightens these bounds by limiting exploration. Empirically, we validate both the bounds' effectiveness and the practical benefits of perplexity regularization in improving prompt generalization.
Serve As Reviewer: ~Qiuyi_Zhang1, ~David_Madras1
Submission Number: 45
Loading