Keywords: amodal segmentation, video amodal segmentation, dataset, occlusion, zero-shot
TL;DR: A realworld video dataset with accurate ground truth segmentation behind occluders and within containers
Abstract: Amodal video object segmentation is fundamentally limited by the absence of datasets that combine real-world complexity with precise ground-truth annotations. To address this, we present Real Video Amodal Segmentation (Real-VAS), a new large-scale, zero-shot evaluation dataset. We introduce a novel data generation pipeline that composites two clips of real-world video, enabling the creation of pixel-perfect amodal ground truth without relying on human estimation or expensive 3D reconstruction. Our dataset is structured into two challenging scenarios: dynamic Occlusion, created by compositing moving objects, and a unique Container category featuring complex, physically constrained interactions. These container scenarios—On Surface Containment, Articulated Containment, and Mobile Containment—allow us to generate precise ground truth by simulating an object's motion based on its container's tracked transformation. As a result, Real-VAS provides a diverse and challenging benchmark for evaluating amodal segmentation models on realistic video with the precision of synthetic data. The dataset and our generation code will be made publicly available.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 3151
Loading