Automated Attention Guidance in Virtual Reality Videos

Paulo Vitor Santana Silva, Lucas L. Neves, Rafael A. Goiás, Diogo Fernandes Costa Silva, Rafael Teixeira Sousa, Arlindo Rodrigues Galvão Filho

Published: 2025, Last Modified: 28 May 2026SVR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite their immersive nature, 360° virtual reality (VR) videos often lack effective attention guidance, leading to user disorientation and missed information. This work proposes a novel method integrating computational vision with natural language processing to automatically guide user attention in 360° VR. It leverages natural language roadmaps to identify and track key elements, applying dynamic visual effects. The comparative evaluation identified Grounding DINO as a particularly suitable detector, while DAM4SAM and Segment Anything 2 (SAM 2) demonstrated strong performance for tracking. Demonstrated on a 360° VR tour, this approach can significantly enhance user experience and comprehension, advancing automated attention guidance for immersive content.

External IDs:dblp:conf/svr/SilvaNGSSF25