A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPsOpen Website

2007 (modified: 16 Jul 2019)AAAI 2007Readers: Everyone
Abstract: An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.
0 Replies

Loading