Zero Shot Generalization of Vision-Based RL Without Data Augmentation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We demonstrate how disentangled representation learning and associative memory can be used to enable zero-shot generalization of vision-based RL agents without data augmentation or fine-tuning.
Abstract: Generalizing vision-based reinforcement learning (RL) agents to novel environments remains a difficult and open challenge. Current trends are to collect large-scale datasets or use data augmentation techniques to prevent overfitting and improve downstream generalization. However, the computational and data collection costs increase exponentially with the number of task variations and can destabilize the already difficult task of training RL agents. In this work, we take inspiration from recent advances in computational neuroscience and propose a model, Associative Latent DisentAnglement (ALDA), that builds on standard off-policy RL towards zero-shot generalization. Specifically, we revisit the role of latent disentanglement in RL and show how combining it with a model of associative memory achieves zero-shot generalization on difficult task variations *without* relying on data augmentation. Finally, we formally show that data augmentation techniques are a form of weak disentanglement and discuss the implications of this insight.
Lay Summary: Humans and other mammals have shown a remarkable ability to adapt to new situations, thanks to their robust visual systems. Modern neuroscience hypothesizes that this is because we can decompose what we see into independent components and relate them to things we have seen before. For example, if shown a cartoon rendering of a human, because the human has two legs, arms, a torso, head, etc., most of us would understand that this cartoon is meant to represent a real human, even if we had never seen that specific cartoon image before. Inspired by this capability, we designed an artificial agent to decompose a visual scene into independent components and relate them to objects the agent has seen before, so that it can solve tasks even in novel environments i.e. where the colors, background, texture of objects, etc. are different. This allows our agent to generalize to new environments it has never seen without requiring additional training data.
Link To Code: https://github.com/SumeetBatra/ALDA_Official
Primary Area: Reinforcement Learning->Deep RL
Keywords: representation learning, reinforcement learning, disentangled representation learning, associative memory, zero-shot generalization
Submission Number: 9102
Loading