Policy optimization by marginal-map probabilistic inference in generative models

Igor Kiselev, Pascal Poupart

2014 (modified: 20 May 2025)AAMAS 2014Readers: Everyone

Abstract: While most current planning methods have focused on the development of scalable approximate algorithms, they often neglect the important aspect of providing algorithmic performance guarantees, or their tightness is sacrificed to improve efficiency. In contrast, we address a challenging problem of solving POMDP planning problems approximately with a focus on solution quality to estimate the quality of such approximations and to decide when a satisfactory plan is available. 1) We demonstrate that the original task of optimizing POMDP controllers can be approached by its reformulation as the equivalent problem of marginal-MAP inference in a novel single-DBN generative model, which guarantees that the control policies computed by probabilistic inference over this model are optimal in the traditional sense. 2) We further solve a POMDP problem approximately with bounded performance guarantees by translating a corresponding marginal-MAP inference problem into its variational form, and developing two Bayesian variational inference algorithms to (i) approximate the marginal-MAP inference, and (ii) compute the upper bound of the solution. 3) The proposed approach to optimizing parameters of POMDP controllers by marginal-MAP inference with bounded performance guarantees is evaluated on several POMDP benchmark problems and the performance of the implemented variational algorithms is compared to previously developed methods.

0 Replies