Memory as State Abstraction Over History

ICLR 2026 Conference Submission20855 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, memory, rl, sequential decision making, state abstraction, abstraction, hierarchy, pomdp
TL;DR: We view memory as temporally-extended state abstraction to organize and extend POMDP classes in the literature.
Abstract: Reinforcement learning is provably difficult in non-Markovian environments, which motivates identifying useful environment classes. Previous work has identified classes such as regular decision processes and approximate information states. While these works address essential properties such as tractability, they do not answer how the classes relate, or when users should prefer one class over another. We resolve this by defining finer POMDP classes in terms of memory and state abstractions. Considering agent memory as a temporally-extended abstraction over the agent's observation-action history, we prove that POMDP classes can be defined using traditional state abstractions, such as model-preservation, optimal value $Q^{\*}$ preservation, and optimal policy $\pi^{\*}$ preservation. In the process, we extend state abstraction to "soft" (stochastic) abstractions and show how this kind of abstraction relates to stochastic memory. Reinterpreting existing POMDP classes using our unified framework enables us to prove new relationships between existing classes and generalize these classes to approximate variants.
Primary Area: reinforcement learning
Submission Number: 20855
Loading