Keywords: representation learning, slot-structured representations, sparse slot-structured transitions, entity-centric representation, unsupervised learning, object-centric
TL;DR: Sparse slot-structured transition model. Training is done such that such that latent slots correspond to relevant entities of the visual scene.
Abstract: Learning an agent that interacts with objects is ubiquituous in many RL tasks. In most of them the agent's actions have sparse effects : only a small subset of objects in the visual scene will be affected by the action taken. We introduce SPECTRA, a model for learning slot-structured transitions from raw visual observations that embodies this sparsity assumption. Our model is composed of a perception module that decomposes the visual scene into a set of latent objects representations (i.e. slot-structured) and a transition module that predicts the next latent set slot-wise and in a sparse way. We show that learning a perception module jointly with a sparse slot-structured transition model not only biases the model towards more entity-centric perceptual groupings but also enables intrinsic exploration strategy that aims at maximizing the number of objects changed in the agent’s trajectory.
Original Pdf: pdf