NeMoS: Nearest Neighbors Bandit meets Active Learning for Online Model Selection

21 Apr 2026 (modified: 04 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The proliferation of open-platform text-to-image generative models has made prompt-wise model selection critical to maximize generation quality and semantic alignment. However, current strategies, such as contextual bandits, often converge slowly and fail to exploit the semantic relationships across prompts. To bridge this gap, we propose NeMoS, a non-parametric bandit framework that couples nearest neighbor reward estimation with a budget-constrained active learning strategy. Specifically, our approach operates in the prompt embedding space and estimates the reward of incoming prompts based on feedback from their nearest neighbors. By limiting ground-truth queries to ambiguous ``near-tie'' scenarios, NeMoS resolves uncertainty efficiently and accelerates convergence. We prove that this active mechanism yields a poly-logarithmic regret bound, marking a significant theoretical improvement over its passive version. Extensive experiments on four datasets with six image generative models show that NeMoS reduces regret by up to 60\% compared to state-of-the-art baselines, while being robust to model addition or removal. \textit{We provide experimental code in the supplementary material.}
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tommaso_R._Cesari1
Submission Number: 8538
Loading