Keywords: Exploration, state abstraction, model-based reinforcement learning
Abstract: Many methods for Model-based Reinforcement Learning (MBRL) provide guarantees for the accuracy of the Markov decision process (MDP) model they can deliver. At the same time, state abstraction techniques allow for a reduction of the size of an MDP while maintaining a bounded loss with respect to the original problem. It may come as a surprise, therefore, that no such guarantees are available when combining both techniques, i.e., where MBRL merely observes abstract states. Our theoretical analysis shows that abstraction can introduce a dependence between samples collected online (i.e., in the real world), which invalidates most results for MBRL in this setting. Collecting samples using a simulator can avoid this problem. We conclude that we should be careful when applying MBRL methods to abstracted real-world data.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: This paper provides an analysis of the combination of exploration methods and approximate state abstraction.
Supplementary Material: pdf
16 Replies
Loading