Abstract: Capturing a digital replica of an environment using hand held devices or mobile mapping systems has become increasingly easy in recent years. However, leveraging large amounts of data for various semantic applications is usually impaired by a costly data annotation process. In this paper, we investigate the topic of object recognition in indoor environments without supervision. We approach the problem from a remapping perspective, where we capture RGB images from the same environment at different times and use the naturally occurring changes to identify single objects. In the first step, we create pairs of images from different recordings and generate object candidates using optical flow and an off-the-shelf region proposal algorithm. Then, we use a self-supervised representation learning framework and cluster the extracted objects. We evaluate the performance of several existing clustering methods in an over-clustering setting, since the number of object classes is unknown in an unsupervised setup. Our experimental validation on a real-world dataset shows that the proposed system can successfully recognize objects and pre-annotate a dataset by exploiting a recapturing process.
0 Replies
Loading