Keywords: Continual reinforcement learning, Catastrophic forgetting, World models, Transformer, Vector-quantized variational autoencoder (VQ-VAE), Low-rank adapters (LoRA), Model-based reinforcement learning, Uncertainty-guided exploration
TL;DR: Tokenized world-model agent that keeps plasticity in the dynamics and stability in the policy, reducing forgetting and enabling forward transfer on the Atari CORA sequence.
Abstract: Catastrophic forgetting is a significant obstacle in continual reinforcement learning, as newly acquired skills overwrite earlier ones, resulting in sharp performance drops and hindering the transfer of knowledge to future tasks. Replay buffers and regularization can mitigate this drift, but often at the expense of brittle transfer, excessive computation, or policy instability. We address these limits with a tokenized, world-model–centric agent. A compact vector-quantized autoencoder discretizes frames into short token sequences; a Transformer world model predicts next-step tokens, rewards, and terminations. A task module fuses explicit task identifiers with trajectory-inferred context via feature-wise modulation, yielding task-aware yet shareable representations. Adaptation is localized by inserting low-rank adapters only into the world model (not the policy), thereby concentrating plasticity in dynamics while maintaining stable control. A heteroscedastic critic supplies uncertainty that gates an adaptive entropy bonus and prioritizes imagined rollouts; per-game moving-average reward normalization and absorbing-state rollouts further stabilize learning. On the six-game Atari CORA benchmark (Isolated Forgetting, Zero-shot Forward Transfer), the agent consistently exhibits lower forgetting with positive forward transfer in task-aware settings and reduced forgetting under equal interaction budgets in task-agnostic settings.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 23689
Loading