BA2C: Bayesian Advantage Actor Critic for Few Sample Learning using Factor Graph Bayesian Neural Networks

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian Neural Networks, Efficient Exploration, Actor-Critic Algorithm, Reinforcement Learning
TL;DR: Managing exploration for on-policy reinforcement learning algorithms through utilizing the uncertainty from the Bayesian Neural Network
Abstract: On-policy reinforcement learning (RL) algorithms, such as Proximal Policy Optimization (PPO), are widely used by researchers and practitioners across various tasks. However, these algorithms are known for their lack of sample efficiency, making them challenging to apply when obtaining training samples is costly, particularly in the absence of an effective simulation environment. While some research exists on Bayesian approaches in the context of RL, which promise a better trade-off between exploration and exploitation, to the best of our knowledge, no prior work has explored the implementation of policy-gradient actor-critic algorithms using expectation-propagation for approximate message passing in Bayesian neural networks (BNNs). In this paper, we propose BA2C, an actor-critic algorithm based on networks represented as a factor graph. Since these networks are trained through approximate message passing rather than gradients, we employ a pseudo-target implementation of the policy gradient theorem. We evaluate our algorithm against three popular RL implementations and observe that required training samples can be reduced up to 50\% to reach desired levels on certain environments during the early stages of training. Furthermore, our findings indicate that the uncertainty-based evaluation using expectation propagation actually helps, and that our algorithm performs better within the expectation-propagation approximation compared to IVON, a state-of-the-art variational inference algorithm.
Primary Area: reinforcement learning
Supplementary Material: zip
Submission Number: 7415
Loading