Keywords: Intrinsic Motivation, Reinforcement Learning, Model-based Planning, Regularity, Manipulation, Zero-shot Generalization, Unsupervised Exploration
TL;DR: We propose regularity as an intrinsic reward signal, to not bias agents towards chaos by naive novelty seeking objectives and instead favor balance and alignment.
Abstract: We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model’s epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.
Submission Number: 8614
Loading