Keywords: auto-annotation, expert-crafted annotation guidelines, 3D detection, LiDAR, foundation models, finetuning
Abstract: A crucial yet under-appreciated prerequisite in machine learning solutions for real-applications is data annotation: human annotators are hired to manually label data according to expert-crafted guidelines.
This is often a laborious, tedious, and costly process.
To study methods for automated data annotation based on expert-crafted annotation guidelines.
we introduce a new benchmark {\em AutoExpert}, short for \emph{Auto-Annotation from Expert-Crafted Guidelines}.
In particular, this work repurposes the well-established nuScenes dataset, commonly used in autonomous driving research, which provides comprehensive annotation guidelines for labeling LiDAR point clouds with 3D cuboids across 18 object classes.
In the guidelines, each class is defined by multimodal data: a few visual examples and nuanced textual descriptions. Notably, no labeled 3D cuboids in LiDAR are provided in the guidelines.
The clear discrepancy between data modalities makes AutoExpert not only challenging but also novel and interesting.
Moreover, the advances of foundation models (FMs) make AutoExpert especially timely,
as FMs offer promising tools to tackle its challenges.
To address AutoExpert, we employ a conceptually straightforward pipeline that (1) utilizes open-source FMs for object detection and segmentation in RGB images, (2) projects 2D detections into 3D using known camera poses, and (3) clusters LiDAR points within the frustum of each 2D detection to generate a 3D cuboid.
Starting with a non-learned solution that leverages off-the-shelf FMs, we progressively refine key components and achieve significant performance improvements, boosting 3D detection mAP from 12.1 to 21.9.
Nevertheless, AutoExpert remains an open and challenging problem, underscoring the urgent need for developing LiDAR-based FMs.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 14954
Loading