Segment Anything with Precise Interaction

Published: 20 Jul 2024, Last Modified: 08 Aug 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although the Segment Anything Model (SAM) has achieved impressive results in many segmentation tasks and benchmarks, its performance noticeably deteriorates when applied to high-resolution images for high-precision segmentation, limiting it's usage in many real-world applications. In this work, we explored transferring SAM into the domain of high-resolution images and proposed Pi-SAM. Compared to the original SAM and its variants, Pi-SAM demonstrates the following superiorities: **Firstly**, Pi-SAM possesses a strong perception capability for the extremely fine details in high-resolution images, enabling it to generate high-precision segmentation masks. As a result,Pi-SAM significantly surpasses previous methods in four high-resolution datasets. **Secondly**, Pi-SAM supports more precise user interactions. In addition to the native promptable ability of SAM, Pi-SAM allows users to interactively refine the segmentation predictions simply by clicking. While the original SAM fails to achieve this on high-resolution images. **Thirdly**, building upon SAM, Pi-SAM freezes all its original parameters and introduces very few additional parameters and computational costs to achieve the above performance. This ensures highly efficient model fine-tuning while also retaining the powerful semantic information contained in the original SAM.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Our work's relevance to the conference can be summarized as follows: **Firstly**, we successfully transferred SAM to the domain of high-resolution image segmentation, achieving high-precision segmentation. This enables it to serve as a general high-precision segmentation tool, with potential applications in various multimedia systems such as robot perception, augmented reality, and background modification in online meetings. **Secondly**, we expanded SAM's ability to perform precise interactions and correct predictions, which was not achievable in the original SAM. This makes it more interaction friendly as a segmentation tool, benefiting many users, especially image/video creators.
Supplementary Material: zip
Submission Number: 4146
Loading