Bayesian Hypothesis Testing Policy Regularization

Published: 12 Jun 2025, Last Modified: 21 Jun 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: AI for Science
Keywords: reinforcement learning, regularization, Bayesian hypothesis testing
TL;DR: Bayesian hypothesis testing provides a principled way of using a policy from a previous study to guide exploration.
Abstract: In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online. BHTPR uses Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.
Serve As Reviewer: ~Sarah_Rathnam1, ~Finale_Doshi-Velez1
Submission Number: 32
Loading