MixUCB: Enhancing Safe Exploration in Contextual Bandits  with Human Oversight

Jinyan Su; Rohan Banerjee; Jiankai Sun; Wen Sun; Sarah Dean

MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight

Jinyan Su, Rohan Banerjee, Jiankai Sun, Wen Sun, Sarah Dean

Published: 09 May 2025, Last Modified: 03 Nov 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safe Exploration; human-in-the-loop contextual bandit

TL;DR: we propose a flexible human-in-the-loop contextual bandit framework that enhances safe exploration by incorporating human expertise and oversight with machine automation.

Abstract: The integration of AI into high-stakes decision-making domains demands safety and accountability. Traditional contextual bandit algorithms for online and adaptive decision-making must balance exploration and exploitation, posing significant risks when applied to critical environments where exploratory actions can lead to severe consequences. To address these challenges, we propose MixUCB, a flexible human-in-the-loop contextual bandit framework that enhances safe exploration by incorporating human expertise and oversight with machine automation. Based on the model's confidence and the associated risks, MixUCB intelligently determines when to seek human intervention. The reliance on human input gradually reduces as the system learns and gains confidence. Theoretically, we analyze the regret and query complexity in order to rigorously answer the question of when to query. Empirically, we validate the effectiveness through extensive experiments on both synthetic and real-world datasets. Our findings underscore the importance of designing decision-making frameworks that are not only theoretically and technically sound, but also align with societal expectations of accountability and safety. Our experimental code is available at: https://github.com/sdean-group/MixUCB

Submission Number: 158

Loading