HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Junyu Cao; Ruijiang Gao; Esmaeil Keyvanshokooh

HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh

Published: 22 Jan 2025, Last Modified: 10 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB (\textsf{RLinUCB}) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (\textsf{HR-Bandit}), which integrates human expertise to enhance performance. \textsf{HR-Bandit} offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

Submission Number: 1055

Loading