Functional Wasserstein Variational Policy Optimization

Junyu Xuan; Mengjing Wu; Zihe Liu; Jie Lu

Functional Wasserstein Variational Policy Optimization

Junyu Xuan, Mengjing Wu, Zihe Liu, Jie Lu

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: policy optimization, uncertainty

Abstract:

Variational policy optimization has become increasingly attractive to the reinforcement learning community because of its strong capability in uncertainty modeling and environment generalization. However, almost all existing studies in this area rely on Kullback–Leibler (KL) divergence which is unfortunately ill-defined in several situations. In addition, the policy is parameterized and optimized in weight space, which may not only bring additional unnecessary bias but also make the policy learning harder due to the complicatedly dependent weight posterior. In the paper, we design a novel functional Wasserstein variational policy optimization (FWVPO) based on the Wasserstein distance between function distributions. Specifically, we firstly parameterize policy as a Bayesian neural network but from a function-space view rather than a weight-space view and then propose FWVPO to optimize and explore the functional policy posterior. We prove that our FWVPO is a valid variational Bayesian objective and also guarantees the monotonic expected reward improvement under certain conditions. Experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithm in terms of both cumulative rewards and uncertainty modeling capability.

Supplementary Material: zip

List Of Authors: Xuan, Junyu and Wu, Mengjing and Liu, Zihe and Lu, Jie

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 193

Loading