Object Segmentation in the Wild with Foundation Models: Application to Vision Assisted Neuro-Prostheses for Upper Limbs
Abstract: In this work, we tackle the problem of semantic object segmentation with foundation models. We investigate whether foundation models, trained on a tremendous number and variety of objects, can sufficiently perform object segmentation without fine-tuning on specific images containing objects of everyday life, but in strongly cluttered visual scenes “in-the-wild” context for guiding upper limb neuro-prostheses. We adapt Segment Anything Model (SAM) to our segmentation scenario and propose strategies to guide the model with gaze fixations, and fine tune it on egocentric visual data. The results of evaluation of our approach show the improvement of IoU segmentation quality metric by up to 0.5 points on real-world data.
Loading