Abstract: We propose a new framework for sequential decision making, Polya Decision Processes (PDP); it can express the agent’s history-dependent transitions by using the Polya urn model. We show that PDP can be converted into a new type of Belief-MDP, whose belief update equation requires only urn model parameters. We introduce its theory, value iteration algorithm, and reinforcement learning algorithm for PDP using the belief state representation. Their effectiveness is confirmed by numerical experiments.
Loading