Keywords: Segmentation, AR/VR
Abstract: Augmented Reality (AR) encompasses transformative technologies that are redefining how humans interact with their environment. A key component of AR is image segmentation, which breaks down the user's front-view scene into distinct regions for analysis. This process is essential for accurately overlaying digital content onto the physical world by detecting and isolating relevant objects. However, despite its importance, image segmentation poses significant computational demands and latency issues on AR devices, which can severely impact the overall user experience. In this paper, we propose Focus-Oriented Segment Anything Model (FoSAM), a framework built upon the Segment Anything Model (SAM) that utilizes real-time gaze data to focus segmentation on regions of interest, substantially lowering computational cost. Experimental results show that FoSAM reduces computational cost by over $50\times$, enabling a seamless visual experience for users, as confirmed by our real-world user study. The code is provided at https://anonymous.4open.science/r/FoSAM-D627.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 13964
Loading