Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play

Orestis Konstantaropoulos; Mehdi Khamassi; Petros Maragos; George Retsinas

Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play

Orestis Konstantaropoulos, Mehdi Khamassi, Petros Maragos, George Retsinas

Published: 23 Jun 2025, Last Modified: 23 Jun 2025Greeks in AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robotics and Embodied AI, Active Perception, Object-Centric Computer Vision, World Models

Abstract: Unlike conventional vision systems that rely on passive observation, biological agents learn through physical interaction. Can a robot similarly develop an understanding of its environment purely through interaction, without prior knowledge or external supervision? In this work, we explore how artificial agents can autonomously learn via intrinsic motivation, much like how children engage in curious free play. We propose a novel, fully self-supervised, object-centric learning framework. The system first segments visual input into discrete entities using Slot Attention, trained on data collected from random robotic actions. A graph-based world model is then trained to predict object-centric dynamics but initially struggles to capture object motion due to the limited diversity of the initial interactions. To overcome this, we introduce an intrinsically motivated reward signal based on world model’s prediction error, which drives a policy to collect more informative trajectories. This results in up to three times more object displacement than random actions, significantly enriching the dataset. Fine-tuning both the vision and world model on these data improves prediction and reconstruction performance. We validate our method in a simulated robotic environment with diverse objects, demonstrating that meaningful visual and physical representations can emerge entirely from self-supervised interaction. This highlights the potential of intrinsically motivated, object-centric learning for autonomous world perception and modeling [1]. [1] O. Konstantaropoulos, M. Khamassi, P. Maragos, and G. Retsinas, “Push, see, predict: Emergent perception through intrinsically motivated play,” in Proceedings of the IEEE International Conference on Development and Learning (ICDL), 2025.

Submission Number: 159

Loading