Resolving Partial Observability in Decision Processes via the Lambda Discrepancy

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, partial observability, value estimation, memory
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Minimizing a discrepancy between different value estimates is beneficial for learning memory under partial observability.
Abstract: We consider the reinforcement learning problem under partial observability, where observations in the decision process lack the Markov property. To cope with partial observability, first we must detect it. We introduce the $\lambda$-discrepancy: a measure of the degree of non-Markovianity of system dynamics. The $\lambda$-discrepancy is the difference between TD($\lambda$) value functions for two different values of $\lambda$; for example, between 1-step temporal difference learning (TD($0$)), which makes an implicit Markov assumption, and Monte Carlo value estimation (TD($1$)), which does not. We prove that this observable and scalable value-based measure is a reliable signal of partial observability. We then use it as an optimization target for resolving partial observability by searching for memory functions---functions over the agent's history---to augment the agent's observations and reduce $\lambda$-discrepancy. We empirically demonstrate that our approach produces memory-augmented observations that resolve partial observability and improve decision making.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6034
Loading