Track: Proceedings Track
Keywords: Foundation Model communication, Instance Segmentation, Deformable Linear Objects
TL;DR: We present an adapter model to communicate embedding spaces between CLIPSeg and SAM inspired by Classical techniques from Digital Communication
Abstract: Classical methods in Digital Communication rely on mixing transmitted signals with carrier frequencies to eliminate signal distortion through noisy channels. Drawing inspiration from these techniques, we present an adapter network that enables CLIPSeg, a text-conditioned semantic segmentation model, to communicate point prompts to the Segment Anything Model (SAM) in the positional embedding space. We showcase our technique on the complex task of Deformable Linear Object (DLO) Instance Segmentation. Our method combines the strong zero-shot generalization capability of SAM and user-friendliness of CLIPSeg to exceed the SOTA performance in DLO Instance Segmentation in terms of DICE Score, while training only 0.7% of the model parameters.
Submission Number: 7
Loading