Behavior Constraining in Weight Space for Offline Reinforcement Learning

Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas A. Runkler

2021 (modified: 03 Feb 2023)CoRR 2021Readers: Everyone

Abstract: In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

0 Replies