Bayesian Hypothesis Testing Policy Regularization

Published: 22 Jun 2025, Last Modified: 27 Jul 2025IBRL @ RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, regularization, Bayesian hypothesis test, mobile health
TL;DR: We introduce a reinforcement learning method that encodes the inductive bias that a prior study policy is correct in some states and not others, using Bayesian hypothesis testing to adaptively guide exploration.
Abstract: In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online to speed up learning in such settings. BHTPR applies the inductive bias that the prior study data matches the current study environment in some states but is incorrect in others. We use Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.
Submission Number: 1
Loading