Hyperband-based Bayesian Optimization for Black-box Prompt Selection

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: A novel Hyperband-based Bayesian Optimization method that performs sample- and query-efficient structural-aware prompt selection for large language models, outperforming state-of-the-art methods across multiple benchmarks.
Abstract: Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to potentially large, combinatorial search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set. We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler to efficiently select prompts. HbBoPs uses embeddings of instructions and few-shot exemplars, treating them as modular components within prompts. This enhances the surrogate model's ability to predict which prompt to evaluate next in a sample-efficient manner. Hyperband improves query-efficiency by adaptively allocating resources across different fidelity levels, reducing the number of validation instances required for evaluating prompts. Extensive experiments across ten diverse benchmarks and three LLMs demonstrate that HbBoPs outperforms state-of-the-art methods in both performance and efficiency.
Lay Summary: Large language models, like ChatGPT, can answer questions, solve problems, or write text. But how well they do often depends on how we ask them. Finding the best way to ask (called a "prompt") can be tricky, especially when using commercial models where we do not have access to their inner workings. Trying out lots of different prompts can be time-consuming and expensive. We created a new method, called HbBoPs, to help find better prompts more efficiently. It breaks each prompt into two parts, the instructions and the examples, and learns which combinations are most likely to work well. It also uses a clever way of testing prompts quickly and cheaply before spending more time and resources on the most promising ones. We tested HbBoPs across a wide range of tasks and language models. Compared to existing methods, It generally found better prompts while using fewer model calls. This means it can help people get more out of powerful language tools while saving time and cost and therefore makes these tools easier to use in everyday applications.
Primary Area: Deep Learning->Large Language Models
Keywords: Black-box prompt selection, large language models, Bayesian optimization, Hyperband, query efficiency, sample efficiency, structural-aware modeling, multi-fidelity, surrogate modeling, deep kernel, Gaussian Process, in-context learning
Submission Number: 7088
Loading