Keywords: POMDP, Reinforcement Learning, Decomposition, Shannon Entropy
TL;DR: This paper proposes a novel theory of state decomposition in POMDP and a simple algorithm to estimate the gap between state and observation.
Abstract: As an essential part of partially observable Markov theory, the measurement of the gap between states and observations is an important issue. In this paper, we propose a novel theory of state decomposition and a simple model-free metric algorithm ($\lambda$-algorithm) for estimating the gap between states and observations in the partially observable Markov decision process with a stationary environment with some missing state conditions. To verify our idea, we design a dimension ablation method to simulate different gaps in the cliff-walking experiment with Q-learning and Sarsa. The simulation results show that $\lambda$ increases steadily as more dimensions are ablated. This proves that $\lambda$ can adequately measure the gap.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
Supplementary Material: zip