Real-VAS: a Realworld Video Amodal Segmentation dataset

Finlay Hudson; William A P Smith

Real-VAS: a Realworld Video Amodal Segmentation dataset

Finlay Hudson, William A P Smith

08 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: amodal segmentation, video amodal segmentation, dataset, occlusion, zero-shot

TL;DR: A realworld video dataset with accurate ground truth segmentation behind occluders and within containers

Abstract: Amodal video object segmentation is fundamentally limited by the absence of datasets that combine real-world complexity with precise ground-truth annotations. To address this, we present Real Video Amodal Segmentation (Real-VAS), a new large-scale, zero-shot evaluation dataset. We introduce a novel data generation pipeline that composites two clips of real-world video, enabling the creation of pixel-perfect amodal ground truth without relying on human estimation or expensive 3D reconstruction. Our dataset is structured into two challenging scenarios: dynamic Occlusion, created by compositing moving objects, and a unique Container category featuring complex, physically constrained interactions. These container scenarios—On Surface Containment, Articulated Containment, and Mobile Containment—allow us to generate precise ground truth by simulating an object's motion based on its container's tracked transformation. As a result, Real-VAS provides a diverse and challenging benchmark for evaluating amodal segmentation models on realistic video with the precision of synthetic data. The dataset and our generation code will be made publicly available.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 3151

Loading