Abstract: Robots often operate in open-world environments, where the capability to generalize to new scenarios is crucial for robotic applications such as navigation and manipulation. In this paper, we propose a novel multi-view self-supervised framework (MVSS) to adapt off-the-shelf segmentation methods in a self-supervised manner by leveraging multi-view consistency. Pixel-level and object-level correspondences are established through unsupervised camera pose estimation and cross-frame object association to learn feature embeddings that the same object are close to each other and embeddings from different objects are separated. Experimental results show that it only needs to observe the RGB-D sequence once without any annotation, our proposed method is able to adapt existing methods in new scenarios to achieve performance close to that of supervised segmentation methods.
Loading