Abstract: In computational visual attention systems, various feature maps are constructed and integrated to yield the final fixation map. These maps highlight interesting regions within an image that are deemed salient according to some task. According to Ehinger et al. [1], by combining three types of feature maps, the performance is boosted in terms of predicting where human fixate when searching for pedestrians. Although this procedure is effective, it lacks a dynamic approach. All three feature maps are computed for each image, resulting in efficiency performance degradation. In addition, combing all the feature maps does not always give the optimum performance for each image in the dataset. Hence, in this paper, we propose a dynamic way of integrating these feature maps on image basis. Our proposed approach is based upon estimating a quality score of a feature map using a regression model. We show that when estimating a quality score of a feature map accurately, it is possible to dynamically select appropriate feature maps to achieve better fixation prediction accuracy and efficiency than that achieved by integrating all feature maps [1].
Loading