Keywords: reinforcement learning, representation learning, benchmark
TL;DR: SPGym extends the 8-tile puzzle to evaluate RL agents by scaling representation learning complexity while keeping environment dynamics fixed, revealing opportunities for advancing representation learning for decision-making research.
Abstract: While effective visual representation learning is critical for reinforcement learning (RL) agents to generalize across diverse environments, existing benchmarks cannot evaluate how different inductive biases affect this capability in isolation. To address this, we introduce the Sliding Puzzles Gym (SPGym), a benchmark that isolates the challenge of visual representation learning. SPGym transforms the classic sliding puzzle into a visual RL task where visual complexity can be scaled by adjusting grid sizes and the pool of images used for tiles, while environment dynamics, observation, and action spaces remain fixed. Our experiments with model-free and model-based algorithms reveal how different architectural and algorithmic biases affect an agent's ability to handle visual diversity. As the image pool grows, all algorithms exhibit performance degradation both in- and out-of-distribution, with sophisticated representation techniques often underperforming simpler approaches like data augmentation. These findings expose critical gaps in visual representation learning and establish SPGym as a valuable tool for developing more robust and generalizable agents.
Submission Number: 13
Loading