BALROG: Contextual Bandits meets Active Learning for Online Generative Model Selection

Jules Damidaux; Basile Lewandowski; Farzan Farnia; Lydia Y. Chen

BALROG: Contextual Bandits meets Active Learning for Online Generative Model Selection

Jules Damidaux, Basile Lewandowski, Farzan Farnia, Lydia Y. Chen

01 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative models, Online model selection, Contextual bandits

TL;DR: We propose a new method for online generative model selection based on Nearest Neighbors bandits and active learning.

Abstract: The rapid proliferation of open-platform text-to-image generative models has made prompt-wise model selection essential for producing high-quality and semantically accurate images, yet it remains a challenging problem. Existing approaches, including contextual bandit algorithms, often converge slowly and fail to exploit the semantic relationships across prompts. We introduce BALROG, a non-parametric, neighbor-based bandit framework that directly addresses these issues by transferring information across similar prompts to speed up convergence and improve generalization. By leveraging similarities between prompts, BALROG achieves faster learning and comes with strong theoretical guarantees through a sub-linear regret bound. In addition, we incorporate an active learning strategy that selectively queries ground-truth model rankings on ambiguous prompts, where ambiguity is quantified by the gap between the estimated rewards of the top two candidate models. This simple yet effective uncertainty measure substantially improves convergence and robustness. Extensive experiments on four datasets with six image generative models show that BALROG reduces regret by up to 60% compared to state-of-the-art baselines, enabling more accurate prompt-wise model selection in practice.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 15

Loading