Keywords: object-centric learning, test-time adaptation, unsupervised domain adaptation, test-time training, entity-centric models
TL;DR: We show test-time adaptation using slot-centric models can improve image segmentation significantly.
Abstract: We consider the problem of segmenting scenes into constituent objects. Current supervised visual detectors, though impressive within their training distribution, often fail to segment out-of-distribution scenes. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric image rendering component, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation greatly improves segmentation in out-of-distribution scenes. We evaluate Slot-TTA in scene segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised domain adaptation models. Please find the full version of our paper at: https://arxiv.org/abs/2203.11194