Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a
front-end, the outputs of which are nonlinearly combined into a real number that
indicates visual saliency. Unfortunately, this requires many design parameters
such as the number, type, and size of the front-end filters, as well as the choice
of nonlinearities, weighting and normalization schemes etc., for which biological
plausibility cannot always be justified. As a result, these parameters have to be
chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency
model directly from human eye movement data. The model is rather simplistic and
essentially parameter-free, and therefore contrasts recent developments in the field
that usually aim at higher prediction rates at the cost of additional parameters and
increasing model complexity. Experimental results show that—despite the lack of
any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround
structure, similar to receptive fields in the early human visual system.
Loading