Abstract: Image clustering is an essential unsupervised learning task in computer vision. The key issue for image clustering is to learn the representative visual features without annotations to extend class spacing. Jigsaw puzzles, as one pretext task of self-supervised visual representation learning to learn the relative spatial position of image tiles, has attracted the attention of many researchers. Most of existing jigsaw puzzle solving strategies are based on the raw image patches, i.e. original pixels, which makes them only concentrate on the low-level statistics. In this paper, we propose a novel learning strategy named Grid Feature Jigsaw (GFJ) for self-supervised image clustering to increase class spacing by deep mining of single sample feature. We train the model to learn the intra-grid representation via the self-supervised paradigm. By dividing the feature map into grids and arranging adjacent grids in each block, we implement a linear regression from the surrounding grids to represent the reference grid. The experiments of unsupervised computer vision benchmark show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and we verify GFJ universal performance via various deep convolutional neural networks.
Loading