Keywords: Reinforcement Learning, Robotics, Prediction, Disentangled Representations
TL;DR: We present a self-supervised method for disentangling agent and scene based on motion and prediction which enables state-of-the-art sample-efficiency and performance in RL.
Abstract: Despite the significant advancements in Deep Reinforcement Learning (RL) observed in the last decade, the amount of training experience necessary to learn effective policies remains one of the primary concerns both in simulated and real environments. Looking to solve this issue, previous work has shown that improved training efficiency can be achieved by separately modeling agent and environment, but usually requiring a supervisory agent mask. In contrast to RL, humans can perfect a new skill from a very small number of trials and in most cases do so without a supervisory signal, making neuroscientific studies of human development a valuable source of inspiration for RL. In particular, we explore the idea of motor prediction, which states that humans develop an internal model of themselves and of the consequences that their motor commands have on the immediate sensory inputs. Our insight is that the movement of the agent provides a cue that allows the duality between agent and environment to be learned. To instantiate this idea, we present Ego-Foresight, a self supervised method for disentangling agent and environment based on motion and prediction. Our main finding is that visuomotor prediction of the agent provides good feature representations for the underlying RL algorithm. To test our approach, we integrate Ego-Foresight with a model-free RL algorithm to solve simulated robotic manipulation tasks, showing its ability to improve efficiency and performance in different tasks while making strides towards real-world RL applications, by removing the need for costly supervisory signals.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10097
Loading