3EED: Ground Everything Everywhere in 3D

Rong Li; Yuhao Dong; Tianshuai Hu; Alan Liang; Youquan Liu; Dongyue Lu; Liang Pan; Lingdong Kong; Junwei Liang; Ziwei Liu

3EED: Ground Everything Everywhere in 3D

Rong Li, Yuhao Dong, Tianshuai Hu, Alan Liang, Youquan Liu, Dongyue Lu, Liang Pan, Lingdong Kong, Junwei Liang, Ziwei Liu

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: 3D Grounding, 3D Scene Understanding, 3D Robotic Perception

TL;DR: We present 3EED, the first large-scale benchmark for 3D visual grounding across vehicles, drones, and quadrupeds, with over 134K 3D objects and 25K human-verified expressions in diverse outdoor scenes.

Abstract: Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platform constraints, and small scale. We introduce 3EED, a multi-platform, multi-modal 3D grounding benchmark featuring RGB and LiDAR data from vehicle, drone, and quadruped platforms. We provide over 128,000 objects and 22,000 validated referring expressions across diverse outdoor scenes -- 10x larger than existing datasets. We develop a scalable annotation pipeline combining vision-language model prompting with human verification to ensure high-quality spatial grounding. To support cross-platform learning, we propose platform-aware normalization and cross-modal alignment techniques, and establish benchmark protocols for in-domain and cross-platform evaluations. Our findings reveal significant performance gaps, highlighting the challenges and opportunities of generalizable 3D grounding. The 3EED dataset and benchmark toolkit are released to advance future research in language-driven 3D embodied perception.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/RRRong/3EED/tree/main

Code URL: https://github.com/iris0329/3eed

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 69

Loading