Mastering the Labyrinth Game: Efficient Multimodal Reinforcement Learning with Selective Reconstruction

Thomas Bi, Ethan Marot, Aswin Ramachandran, Raffaello D'Andrea

Published: 2025, Last Modified: 22 Jan 2026IROS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In previous work, model-based reinforcement learning was applied to a real-world labyrinth game to demonstrate sample-efficient learning using world models. In this paper, we further enhance sample efficiency and autonomy by introducing selective reconstruction: instead of reconstructing the full visual observation, our approach reconstructs only the low-dimensional physical state signals (e.g., marble position and plate inclination), while still leveraging the complete visual input for decision-making. This targeted reconstruction focuses the world model on learning dynamics-relevant information, thereby reducing computational overhead and model complexity. Additionally, we incorporate prioritized experience replay to accelerate learning in newly explored regions of the maze and implement an autonomous marble reloader to eliminate manual resets. Together, these enhancements reduce the required collected experience from 5 hours to 1.5 hours while achieving comparable performance, and enable fully autonomous learning without human supervision.

External IDs:dblp:conf/iros/BiMRD25