Autoregressive Unsupervised Image Segmentation

Max Ian Manning; David A. Bell; Travis Morton

Autoregressive Unsupervised Image Segmentation

Max Ian Manning, David A. Bell, Travis Morton

06 Dec 2020 (modified: 05 May 2023)ML Reproducibility Challenge 2020 Blind SubmissionReaders: Everyone

Abstract: Scope of Reproducibility: Ouali et. al. [1] consider the problem of unsupervised image segmentation, that is, the assignment of a class label toeach pixel of an input image without the use of any training labels. The authors claim a novel method for performing this task, which involves training a convolutional neural network by maximizing the mutual information between outputs obtained using different orderings of the input image. The paper reports state-of-the-art pixel accuracy forunsupervised methods on the COCO-Stuff and Potsdam benchmark datasets, as well as on 3-class variants of these datasets, Potsdam-3 and COCO-stuff-3. The scope of this reproduction is to create an implementation of the described method and verify its performance on the benchmark datasets. Methodology: We created an original implementation of the described method using the PyTorch framework. All experiments were conducted on a single desktop computer with an Nvidia GTX 1080Ti GPU. The total compute budget was approximately 40 GPU hours. Results: We reproduced the accuracy claimed in the paper to within 1% on the Potsdam-3 dataset and to within 4% on the Potsdam dataset. Additionally, we found that the inclusion of a self-attention layer can improve model performance as reported in the paper. However, our model’s accuracy on the COCO-Stuff dataset is drastically lower than is reported inthe paper. This may be due to the smaller model and reduced batch size that we adopted in our reproduction due to limited computational resources. What was easy: The model architecture is easily implemented using standard machine learning frameworks. The main building block of the model is the masked convolution, which can be implemented as a simple extension to regular 2D convolutions. Additionally, this work builds on a previous paper which uses the mutual information loss as an unsupervised clustering objective [2]. The published code for the loss function from this paper was adapted to our application with minimal changes. What was difficult: The model configurations specified by the authors are too large for single-GPU training. We used a smaller network andreduced batch sizes in our reproductions, but we note that this is may lead to differences in model performance. Communication with original authors: We corresponded by email with the original authors to clarify the architectural details of the self-attention layer in the model.

Paper Url: https://openreview.net/forum?id=Bx-3UYdfniX

4 Replies

Loading