Abstract: Abstract—Partially observable Markov decision processes (POMDPs) is a well-developed framework for sequential decision-making under uncertainty and partial information. This article considers the (inverse) structural estimation of the primitives of a POMDP based upon data in the form of sequences of observables and implemented actions. We analyze the structural properties of an entropy regularized POMDP and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator, and illustrate the approach with an equipment replacement problem.
Loading