Learning Partially Observable Markov Decision Processes Using Coupled Canonical Polyadic Decomposition

Published: 2019, Last Modified: 13 Nov 2024DSW 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a new algorithm for learning the model parameters of a partially observable Markov decision process (POMDP) based on coupled canonical polyadic decomposition (CPD). Coupled CPD for a set of tensors is an extension to CPD for individual tensors, which has improved identifiability properties, as well as an analogous simultaneous diagonalization (SD) algorithm for uniquely recovering the latent factors efficiently. We explain how to form a set of three-way tensors from the trajectory of a POMDP under a stationary memoryless policy, so that coupled CPD can be applied afterwards to recover the model parameters, with identifiability and computational guarantees.
Loading