Interactive Segmentation by Considering First-Click Intentional Ambiguity

Published: 20 Jul 2024, Last Modified: 23 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Interactive segmentation task aims at taking into account the influence of user preferences on the basis of general semantic segmentation in order to obtain the specific target-of-interest. Given the fact that most of the related algorithms generate a single mask only, the robustness of which might be constrained due to the diversity of user intention in the early interaction stage, namely the vague selection of object part/whole object/adherent object, especially when there's only one click. To handle this, we propose a novel framework called Diversified Interactive Segmentation Network (DISNet) in which we revisit the peculiarity of first click: given an input image, DISNet outputs multiple candidate masks under the guidance of first click only, it then utilizes a Dual-attentional Mask Correction (DAMC) module consisting of two branches: a) Masked attention based on click propagation. b) Mixed attention within first click, subsequent clicks and image w.r.t. position and feature space. Moreover, we design a new sampling strategy to generate GT masks with rich semantic relations. The comparison between DISNet and mainstream algorithms demonstrates the efficacy of our methods, which further exemplifies the decisive role of first click in the realm of interactive segmentation.
Relevance To Conference: We present a novel technology that aims to manually segment objects in open-world images, namely "interactive segmentation". In this realm, users are required to provide prompts such as mouse click or description text in order to select specific object instances in a real-time manner.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Interactions and Quality of Experience
Submission Number: 2522
Loading