Abstract: Recent years have witnessed rapid progress of convolutional neural networks (CNNs) and their successful application in the task of saliency prediction for omnidirectional images (ODIs). Albeit achieving tremendous performance improvements, these CNNs-based saliency models are plagued by two major shortcomings: spatial content-agnostic and computationally intensive. Inspired by the effectiveness of equivariant network in the majority of computer vision tasks, we propose a novel efficient equivariant dynamic aggregation saliency (\(E^2DAS\)) model to efficiently tackle the issue of human fixation prediction in ODIs. To be specific, our proposed model consists of an efficient equivariant module, a dynamic convolutional aggregation module, and an optimization computation module. Different from existing saliency models for ODIs, we are the first attempt to introduce an efficient equivariant dynamic convolutional aggregation operation into the saliency prediction task, which can fundamentally alleviate the projection distortion problem and can effectively learn spatial content-adaptive features. Moreover, we clearly observe a considerable decrease in the number of parameters resulting from the replacement of standard convolution with dynamic convolution aggregation. Extensive experiments on several benchmark datasets show the proposed model’s superiority over other state-of-the-art methods in terms of performance.
Loading