Partially Observable Reinforcement Learning with Memory Traces

Onno Eberhard; Michael Muehlebach; Claire Vernade

Partially Observable Reinforcement Learning with Memory Traces

Onno Eberhard, Michael Muehlebach, Claire Vernade

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Eligibility traces are more effective than sliding windows as a memory mechanism for RL in POMDPs.

Abstract: Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce *memory traces*. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the return errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.

Lay Summary: Reinforcement learning (RL) is a framework where an agent learns to make decisions by interacting with an environment. In many situations, the agent can't fully observe the environment at each step, so it must rely on its past observations to act effectively. A common approach is to use a fixed-length window of recent observations, but this quickly becomes inefficient and hard to scale as the window grows. Our work introduces *memory traces*, a simple and scalable way to summarize past observations. Instead of storing the history explicitly, memory traces maintain a running average that gives more weight to recent events. This lets the agent keep a useful summary of what it has seen—without overwhelming memory or computation. We rigorously analyze how well memory traces perform in a key learning task—estimating the long-term outcomes of actions—when the system learns from past data. We show that in some environments, this approach leads to faster and more reliable learning. Finally, we demonstrate that memory traces also improve performance in real-time learning scenarios, making them a practical tool for smarter decision-making under uncertainty.

Link To Code: https://github.com/onnoeberhard/memory-traces

Primary Area: Theory->Reinforcement Learning and Planning

Keywords: Reinforcement learning theory, Partial observability, Memory

Submission Number: 12039

Loading