Abstract: Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level ones, such as attention and gaze direction for entire objects. Despite this, these methods fail to account for the dissimilarity between objects, which affects human visual attention. In this paper, we introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects, such as their appearance and size dissimilarities. Our approach allows us to fuse our object dissimilarities with features extracted by any deep saliency prediction network. As evidenced by our experiments, this consistently boosts the accuracy of the baseline networks, enabling us to outperform the state-of-the-art models on three saliency benchmarks, namely SALICON, MIT300 and CAT2000. Our project page is at https://github.com/IVRL/DisSal.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=QxgCT02j8d
Changes Since Last Submission: Revision 1: We have addressed all the comments/ questions of the reviewers in the revised version of the paper (main content comprising 12 pages and the appendix with additional analyses/ results). We are also uploading a supplementary zip file with additional qualitative results (1 page) on synthetic stimuli, separately, as it was not aligned with the flow of the paper. These results on synthetic stimuli are to address part of the comments received from reviewer Mvt1 . Thank you for taking the time to look into these results. Revision 2: We have rephrased Section H, Appendix on Page 23 and the caption of figure 9 on Page 24, for better clarity. These were in response to the updated feedback by Reviewer RUH5. Revision 3: We have changed the Figure 1 and have updated the hyperparameter training details in the Appendix. These were in response to reviewer Mvt1. Revision 4 (latest): We have added Figure 2 and some statistical clarification about the presence of size and appearance dissimilarity in the ground-truth saliency maps to motivate our work. We have also added details about the hyperparameters.
Assigned Action Editor: ~David_Fouhey2
Submission Number: 284