Audio-Visual Temporal Saliency Modeling Validated by fMRI Data

Petros Koutras, Georgia Panagiotaropoulou, Antigoni Tsiami, Petros Maragos

2018 (modified: 10 Nov 2022)CVPR Workshops 2018Readers: Everyone

Abstract: In this work we propose an audio-visual model for pre- dicting temporal saliency in videos, that we validate and evaluate in an alternative way by employing fMRI data. We intend to bridge the gap between the large improve- ments achieved during the last years in computational mod- eling, especially in deep learning, and the neurobiological and behavioral research regarding human vision. The pro- posed audio-visual model incorporates both state-of-the-art deep architectures for visual saliency, which were trained on eye-tracking data, and behavioral findings concerning audio-visual integration in multimedia stimuli. A new fMRI database has been collected for evaluation purposes, that includes various videos and subjects. This dataset may prove useful not only for saliency but for other computer vision problems as well. The evaluation of our model us- ing the new fMRI database under a mixed-effect analysis shows that the proposed saliency model has strong cor- relation with both the visual and audio brain areas, that confirms its effectiveness and appropriateness in predicting audio-visual saliency for dynamic stimuli.

0 Replies