Polya Decision Processes: A New History-Dependent Framework for Reinforcement Learning

Masahiro Kohjima

Published: 2022, Last Modified: 13 May 2024CDC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a new framework for sequential decision making, Polya Decision Processes (PDP); it can express the agent’s history-dependent transitions by using the Polya urn model. We show that PDP can be converted into a new type of Belief-MDP, whose belief update equation requires only urn model parameters. We introduce its theory, value iteration algorithm, and reinforcement learning algorithm for PDP using the belief state representation. Their effectiveness is confirmed by numerical experiments.