Keywords: Ambiguity, Segment Anything Model, Interactive Segmentation
Abstract: Segment Anything Model (SAM) often encounters ambiguity in interactive segmentation, where insufficient user interaction leads to inaccurate segmentation of the target object. Existing approaches primarily address ambiguity through repeated human-model interactions, which are time-consuming due to the inherent latency of human responses. To reduce human efforts, we propose a novel interactive segmentation framework that leverages the model’s inherent capabilities to effectively segment ambiguous objects.
Our key idea is to create an annotator-like agent to interact with the model. The resulting SmartSAM method mimics intelligent human annotators, resolving ambiguity with a single click and one reference instance. The agent generates multiple prompts around the initial click to simulate diverse annotator behaviors and refines the output masks by iteratively adding click chains in uncertain regions, thereby producing a set of candidate masks. Finally, the agent selects the mask that most closely aligns with the user’s intent, as indicated by the reference instance. Furthermore, we formalize the agent’s behavior as a fuzzy regression problem by quantifying ambiguity using fuzzy entropy. We demonstrate that our agent yields lower entropy than traditional methods, and we establish robustness and sufficiency theorems to ensure effective, human-like decision-making within a bounded range of actions. We evaluate our approach on multiple segmentation benchmarks and demonstrate its superiority over state-of-the-art methods.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7273
Loading