An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose an online learning method to adaptively select the best generative model or LLM for each input prompt.
Abstract: Selecting a sample generation scheme from multiple prompt-based generative models, including large language models (LLMs) and prompt-guided image and video generation models, is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed PAK-UCB algorithm addresses a contextual bandit (CB) setting with shared context variables across the arms, utilizing the generated data to update kernel-based functions that predict the score of each model available for unseen text prompts. Additionally, we leverage random Fourier features (RFF) to accelerate the online learning process of PAK-UCB. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show that RFF-UCB performs successfully in identifying the best generation model across different sample types. The code is available at: [github.com/yannxiaoyanhu/dgm-online-select](github.com/yannxiaoyanhu/dgm-online-select).
Lay Summary: As the number of available generative AI models continues to grow, from large language models (LLMs) to text-to-image and video generators, selecting the right model for a given task has become increasingly important. Traditionally, model selection is performed by ranking models according to their average performance across a large set of prompts. However, this approach overlooks an important fact: different models often excel on different types of prompts. For example, one model may perform best on prompts about nature scenes, while another may be better at generating images of people. To address this limitation, we propose an *online learning approach* that adapts model selection to the specific prompt being used. Instead of relying on static rankings, our method learns in real time which models work best for which kinds of prompts, allowing it to make more informed, prompt-specific model selections. We formulate this problem as a *contextual bandit* task, where the *“context”* is the input prompt, and the *“arms”* are the available generative models. Our proposed algorithm, called *PAK-UCB*, uses kernel-based predictors to model how well each model is likely to perform on a new prompt, based on past generations. Our experiments suggest that this approach can quickly learn to assign prompts to the most suitable generative models across a range of tasks, including text-to-image and image-to-text generation. The result is more efficient use of generative models, reducing the cost of querying suboptimal models, and improved performance for end users.
Link To Code: github.com/yannxiaoyanhu/dgm-online-select
Primary Area: Reinforcement Learning
Keywords: Online Learning, Large Language Models, Evaluation and Selection of Generative Models, Contextual Bandits
Submission Number: 9440
Loading