Abstract: Fully unsupervised semantic segmentation of images has been a challenging problem in computer vision. Many deep learning models have been developed for this task, most of which using representation learning guided by certain unsupervised or self-supervised loss functions towards segmentation. In this paper, we conduct dense or pixel-level representation learning using a fully-convolutional autoencoder; the learned dense features are then reduced onto a sparse graph where segmentation is encouraged from three aspects: nor-malised cut, similarity and continuity. Our method is one- or few-shot, minimally requiring only one image (i.e., the target image). To mitigate overfitting caused by few-shot learning, we compute the reconstruction loss using augmented size-varying patches sampled from the image(s). We also propose a new adjacency-based loss function for continuity, which allows the number of superpixels to be arbitrarily large whereby the creation of the sparse graph can remain ful
Loading