Abstract: Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this paper, we propose a novel system for scene understanding, leveraging object-centric generative models. We demonstrate an agent that is able to learn and reason about 3D objects in an unsupervised fashion and is able to infer object category and pose in an allocentric reference frame. Our agent can infer actions to reach a given, object-relative target viewpoint in simulation, outperforming a supervised baseline trained on the same object set.
3 Replies
Loading