Distractor-Robust Reinforcement Learning via Variational Bisimulation

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: state abstraction, variational inference, bisimulation relations, model-based reinforcement learning
TL;DR: Using variational inference, we learn bisimulation-based representations that discard distractors while preserving task-relevant state.
Abstract: Model-based reinforcement learning (MBRL) promises data efficiency and generalization, but typical reconstruction-based objectives encourage models to waste representational capacity on task-irrelevant distractors. We introduce VIBES (Variational Inference for Bisimulation-based Encoded States), a new objective that replaces pixel reconstruction with an adversarial term, which enforces that latent states suffice to predict both rewards and the _next_ latent state. We show theoretically that, under mild assumptions, global optima of this objective correspond to encoders that induce bisimulation relations, ensuring that latent states capture task-relevant information while discarding irrelevant variation. Our method serves as a drop-in replacement for Dreamer’s model-learning component and achieves state-of-the-art performance on the Distracting Control Suite. Unlike prior approaches, it does not rely on image-specific augmentations and applies equally well to high-dimensional vector-state tasks, demonstrated on a 100-link swimmer. Finally, latent-space analyses (UMAP embeddings and nearest-neighbor probes) confirm that the learned representations are sensitive to task-relevant structure while invariant to distractors.
Primary Area: reinforcement learning
Submission Number: 22852
Loading