Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

TMLR Paper7360 Authors

05 Feb 2026 (modified: 19 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The recent advancement of Large Language Models (LLMs) offers new opportunities to generate user preference data to warm-start bandits. Recent studies on contextual bandits with LLM initialization (CBLI) have shown that these synthetic priors can significantly lower early regret. However, these findings assume that LLM-generated choices are reasonably aligned with actual user preferences. In this paper, we systematically examine how LLM-generated preferences perform when random and label-flipping noise is injected into the synthetic training data. For aligned domains, we find that warm-starting remains effective up to 30\% corruption, loses its advantage around 40\%, and degrades performance beyond 50\%. When there is systematic misalignment, even without added noise, LLM-generated priors can lead to higher regret than a cold-start bandit. To explain these behaviors, we develop a theoretical analysis that decomposes the effect of random label noise and systematic misalignment on the prior error driving the bandit’s regret, and derive a sufficient condition under which LLM-based warm starts are provably better than a cold-start bandit. We validate these results across multiple conjoint datasets and LLMs, showing that estimated alignment reliably tracks when warm-starting improves or degrades recommendation quality.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Antoine_Patrick_Isabelle_Eric_Ledent1
Submission Number: 7360
Loading