ScanTD: 360° Scanpath Prediction based on Time-Series Diffusion

Yujia Wang, Fang-Lue Zhang, Neil A. Dodgson

Published: 2024, Last Modified: 26 Feb 2026ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Scanpath generation in 360° images aims to model the realistic trajectories of gaze points that viewers follow when exploring panoramic environments. Existing methods for scanpath genera- tion suffer from various limitations, including a lack of global atten-tion to panoramic environments, insufficient diversity in generated scanpaths, and inadequate consideration of the temporal sequence of gaze points. To address these challenges, we propose a novel approach, named ScanTD, which employs a conditional Diffusion Model-based method to generate multiple scanpaths. Notably, a transformer-based time-series (TTS) module with a novel attention mechanism is integrated into ScanTD to capture the temporal de- pendency of gaze points effectively. Additionally, ScanTD utilizes a Vision Transformer-based method for image feature extraction, en- abling better learning of scene semantic information. Experimental results demonstrate that our approach outperforms state-of-the-art methods across three datasets. We further demonstrate its general- izability by applying it to the 360° saliency detection task.