Abstract: Despite their immersive nature, 360° virtual reality (VR) videos often lack effective attention guidance, leading to user disorientation and missed information. This work proposes a novel method integrating computational vision with natural language processing to automatically guide user attention in 360° VR. It leverages natural language roadmaps to identify and track key elements, applying dynamic visual effects. The comparative evaluation identified Grounding DINO as a particularly suitable detector, while DAM4SAM and Segment Anything 2 (SAM 2) demonstrated strong performance for tracking. Demonstrated on a 360° VR tour, this approach can significantly enhance user experience and comprehension, advancing automated attention guidance for immersive content.
External IDs:dblp:conf/svr/SilvaNGSSF25
Loading