Learning proposals for sequential importance samplers using reinforced variational inference

Zafarali Ahmed; Arjun Karuvally; Doina Precup; Simon Gravel

Learning proposals for sequential importance samplers using reinforced variational inference

Zafarali Ahmed, Arjun Karuvally, Doina Precup, Simon Gravel

Published: 10 Apr 2019, Last Modified: 05 May 2023drlStructPred 2019Readers: Everyone

Keywords: variational inference, reinforcement learning, monte carlo methods, stochastic processes

Abstract: The problem of inferring unobserved values in a partially observed trajectory from a stochastic process can be considered as a structured prediction problem. Traditionally inference is conducted using heuristic-based Monte Carlo methods. This work considers learning heuristics by leveraging a connection between policy optimization reinforcement learning and approximate inference. In particular, we learn proposal distributions used in importance samplers by casting it as a variational inference problem. We then rewrite the variational lower bound as a policy optimization problem similar to Weber et al. (2015) allowing us to transfer techniques from reinforcement learning. We apply this technique to a simple stochastic process as a proof-of-concept and show that while it is viable, it will require more engineering effort to scale inference for rare observations

5 Replies

Loading