Learning Bayes-Optimal Representation in Partially Observable Environments via Meta-Reinforcement Learning with Predictive Coding
Keywords: partial observability, meta-reinforcement learning, predictive coding, self-supervised learning, representation learning, neuro-ai, decision-making under uncertainty, partially observable Markov decision process, POMDP, deep reinforcement learning
TL;DR: Meta-reinforcement learning with self-supervised predictive coding modules can learn interpretable, task-relevant representations with higher equivalence to Bayes-optimal states than black-box meta-RL models in partially observable environments.
Abstract: Learning a compact representation summarizing history is essential for decision-making, planning, and generalization in partially observable environments. Memory-based meta-reinforcement learning (RL) has been shown to learn near Bayes-optimal policy under partial observability. However, its learned representations can fail to achieve equivalence to minimally-sufficient, Bayes-optimal belief states, potentially hindering its robustness and generalization. To overcome this challenge, we propose a meta-RL framework for learning an explicit belief representation by incorporating self-supervised predictive modules inspired by predictive coding from neuroscience literature. Our approach outperforms conventional meta-RL by generating more interpretable and task-relevant representations, which better capture the underlying task structure and dynamics. Using state machine simulation, we demonstrate the learned representations are more equivalent to Bayes-optimal states and linked to improved future prediction and policy learning. Our results suggest that self-supervised future prediction is a promising technique for enhancing representation learning in partially observable environments.
Submission Number: 70
Loading