Credible Optimism for Interpretable Semantic Decision Making with Bayesian UCB.

13 Nov 2025 (modified: 01 Dec 2025)NeurIPS 2025 Workshop FMEA SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian learning, Reinforcement Learning, Decision Making under uncertainity
TL;DR: An RL framework that fuses Bayes-UCB posterior optimism with a lightweight semantic “think-and-act” prior to achieve efficient, interpretable exploration in sparse-reward tasks.
Abstract: Reinforcement learning in sparse-reward settings is often hindered by undirected exploration and unclear decision rules, which undermines sample-efficiency and trust. We propose a Bayesian Upper Confidence Bound (Bayes-UCB) agent that targets exploration toward state–action pairs with high posterior uncertainty, and we couple it with a lightweight semantic think–and–act layer, an interpretable decision module that produces natural-language rationales and a soft action prior. In this experiment we consider the TD targets as noisy observations for a Normal–Normal posterior over $Q(s,a)$ and select actions via a credible-optimistic index combined with a log-prior derived from the semantic layer. We instantiate this framework in tabular mazes and evaluate on their cumulative return, steps-to-goal, minimum distance-to-goal and posterior $Q$-variance. As a part of this study we show that our Bayes UCB with semantic layer achieves earlier goal discovery, fewer steps, steadily declining entropy, and rapidly shrinking credible bonuses, while the rationales succinctly explain why each action is chosen at a state. Taken together, calibrated uncertainty and interpretable priors yield efficient, transparent, and auditable exploration in sparse-reward tasks.
Submission Number: 1
Loading