Reproducibility of "Multi-scale Interactive Network for Salient Object Detection" for ML Reproducibility Challenge 2020

FARZANEH KAJI; Curtis Stewart; Daoyi Wang; Mariam Lahlou

Reproducibility of "Multi-scale Interactive Network for Salient Object Detection" for ML Reproducibility Challenge 2020

FARZANEH KAJI, Curtis Stewart, Daoyi Wang, Mariam Lahlou

06 Dec 2020 (modified: 05 May 2023)ML Reproducibility Challenge 2020 Blind SubmissionReaders: Everyone

Abstract: Scope of Reproducibility The main claim of the original paper that our team tested was that their Multi-scale Interactive Network for Salient Object Detection model outperforms existing state-of-the-art SOD methods on the 5 mentioned datasets. Methodology Our team used the provided GitHub source from the authors, after downloading and installing all necessary dependencies. We first tested the model on the 5 given datasets in the paper then on 3 others by training on the same DUTS/Train dataset as mentioned in the paper. Since the provided code only computed three of the six measurement statistics reported on, a separate Saliency Evaluation Toolbox was utilized to compute the remaining measurements. While training the MINet model on our 3 datasets, we noticed some limitations. One being the results of all three F-measure statistics (Max-F, Mean-F and the Weighted-F) on both the SOC and THUR15K dataset were meaningless due to the datasets containing images with no salient object, so we attempted to improve upon the F-Measures. A related issue was that the THUR15K dataset could not be run through the MINet model because it does not include a mask for images with no salient object, so we wrote a script to generate a black mask. We then tested the model by training on the SOC training set and then on the combined DUTS and SOC training sets. Results After testing the results presented using the five datasets provided, we found good results that were close to the ones presented in the original paper. After checking the performance of the model on 3 additional datasets, we decided to modify the F-measures statistics as they were meaningless on the SOC dataset and recompute the measurements which gave us improved results. After training on the SOC set which contained images where the ground truth is a pure black mask, there was improvement on the SOC validation set and THUR15K. Training on the combined SOC and DUTS training sets saw overall good performance on all datasets except the THUR15K dataset, where the model is expected to not only identify salient objects, but if it is the object of interest. What was easy Since the author’s made their Python source code publicly available on GitHub, testing the model was fairly quick. The code available was thorough and worked well. Additionally, as opposed to the training, the testing did not take much time and the measurement results were fairly good. What was difficult The biggest difficulty our team faced was the long training time and access to GPUs. For instance, training the MINet-VGG16 Model on the DUTS/Train dataset as in the paper took 63.5 hours. Lastly, the THUR15K dataset needed to be modified before we could test on it. Communication with original authors Our team did not communicate with the authors at all, except to use their publicly available source code.

Paper Url: https://openreview.net/forum?id=KoG51HaIl7y

4 Replies

Loading