Investigating Memory in RL with POPGym Arcade

Zekang Wang; Zhe He; Borong Zhang; Edan Toledo; Steven Morad

Investigating Memory in RL with POPGym Arcade

Zekang Wang, Zhe He, Borong Zhang, Edan Toledo, Steven Morad

10 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, memory, recurrent model, pomdp

TL;DR: We propose tools and environments to study memory and partial observability, and then perform some memory analyses.

Abstract: How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 3586

Loading