Keywords: prompt learning; black-box optimization; imbalanced data
Abstract: Black-box prompt learning has proven to be an effective approach for customizing large language models (LLMs) offered as services to address various downstream tasks.
Within this domain, policy gradient-based methods have garnered substantial attention as a prominent approach for learning discrete prompts.
However, the highly imbalanced data distribution in the real world limits the applicability of such approaches by influencing LLMs' tendency to favor certain categories.
To tackle the challenge posed by imbalanced data, this paper pioneers the integration of pairwise AUC loss into the policy gradient optimization of discrete text prompts and proposes learning discrete prompts with doubly policy gradient.
Unfortunately, the doubly policy gradient estimation suffers from two variance components, resulting in unstable optimization.
As a further improvement, we propose (1) a novel unbiased variance-reduced doubly policy gradient estimator and (2) incorporating the STORM variance reduction technique.
Ultimately, we introduce a novel momentum-based discrete prompt learning method with doubly policy gradient (mDP-DPG).
Crucially, we provide theoretical convergence guarantees for mDP-DPG within standard frameworks.
The experimental results show that mDP-DPG surpasses baseline approaches across diverse imbalanced text classification datasets, emphasizing the advantages of our proposed approach for tackling data imbalance.
Our code is available at the following URL: https://anonymous.4open.science/r/DPDPG-1ECB.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3010
Loading