Keywords: Amodal segmentation, SAM, Open world
Abstract: Amodal segmentation, which aims to predict complete object shapes including occluded regions, remains challenging in open-world scenarios where models must generalize to novel objects and contexts. While the Segment Anything Model (SAM) has demonstrated remarkable zero-shot generalization capabilities, it is fundamentally limited to visible region segmentation. This paper presents Amodal SAM, a framework that extends SAM's capabilities to amodal segmentation while preserving its powerful generalization ability. The improvements lie in three aspects: (1) a lightweight Spatial Completion Adapter that enables occluded region reconstruction, (2) a Target-Aware Occlusion Synthesis (TAOS) pipeline that addresses the scarcity of amodal annotations by generating diverse synthetic training data, and (3) novel learning objectives that enforce regional consistency and topological regularization. Extensive experiments demonstrate that Amodal SAM achieves state-of-the-art performance on standard benchmarks while exhibiting strong generalization to novel scenarios. Furthermore, our framework seamlessly extends to video sequences, as the first attempt to tackle the open-world video amodal segmentation. We hope our research can advance the field toward practical amodal segmentation systems that can operate effectively in unconstrained real-world environments. Code and models will be made publicly available.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8671
Loading