SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Jianhua Han; Xiwen Liang; Hang Xu; Kai Chen; Lanqing HONG; Chaoqiang Ye; Wei Zhang; Zhenguo Li; Xiaodan Liang; Chunjing Xu

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing HONG, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

03 Jun 2021 (modified: 26 May 2025)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: autonomous driving, object detection, dataset, benchmark, self-supervised learning, semi-supervised learning

Abstract: Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest benchmark to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (e.g., detection) trained using extensive annotated data to ensure the safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (e.g., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent powerful advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing dataset (e.g., KITTI, Waymo) either provides only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models. We show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when finetuning autonomous driving downstream tasks. This benchmark will be used to hold the ICCV2021 SSLAD challenge. The data and more up-to-date information have been released at https://soda-2d.github.io.

Supplementary Material: zip

URL: https://soda-2d.github.io

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 9 code implementations](https://www.catalyzex.com/paper/soda10m-towards-large-scale-object-detection/code)

12 Replies

Loading