Leader360V: A Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment

WEIMING ZHANG; Dingwen Xiao; Aobotao Dai; Yexin Liu; Tianbo Pan; Shiqi Wen; Lei Chen; Lin Wang

Leader360V: A Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment

WEIMING ZHANG, Dingwen Xiao, Aobotao Dai, Yexin Liu, Tianbo Pan, Shiqi Wen, Lei Chen, Lin Wang

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 360 Video Dataset, Automatic Annotation Pipeline

TL;DR: We propose the first large-scale and real-world 360 dataset and a automatic annotate pipeline to reduce the cost of manual annotation.

Abstract: 360 video captures the complete surrounding scenes with the ultra-large field of view of 360x180. This makes 360 scene understanding tasks, *e.g.*, segmentation and tracking, crucial for appications, such as autonomous driving, robotics. With the recent emergence of foundation models, the community is, however, impeded by the lack of large-scale, labelled real-world datasets. This is caused by the inherent spherical properties, *e.g.*, severe distortion in polar regions, and content discontinuities, rendering the annotation costly yet complex. This paper introduces **Leader360V**, the **first** large-scale (10K+), labeled real-world 360 video datasets for instance segmentation and tracking. Our datasets enjoy high scene diversity, ranging from indoor and urban settings to natural and dynamic outdoor scenes. To automate annotation, we design an automatic labeling pipeline, which subtly coordinates pre-trained 2D segmentors and large language models (LLMs) to facilitate the labeling. The pipeline operates in three novel stages. Specifically, in the **Initial Annotation Phase**, we introduce a Semantic- and Distortion-aware Refinement (**SDR**) module, which combines object mask proposals from multiple 2D segmentors with LLM-verified semantic labels. These are then converted into mask prompts to guide SAM2 in generating distortion-aware masks for subsequent frames. In the **Auto-Refine Annotation Phase**, missing or incomplete regions are corrected either by applying the SDR again or resolving the discontinuities near the horizontal borders. The **Manual Revision Phase** finally incorporates LLMs and human annotators to further refine and validate the annotations. Extensive user studies and evaluations demonstrate the effectiveness of our labeling pipeline. Meanwhile, experiments confirm that Leader360V significantly enhances model performance for 360 video segmentation and tracking, paving the way for more scalable 360 scene understanding. We release our dataset and code at {https://leader360v.github.io/Leader360V\_HomePage/} for better understanding.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/Leader360V/Leader360V

Code URL: https://github.com/Leader360V/A3-360V

Supplementary Material: pdf

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 627

Loading