Abstract: Although there is a diversity of publicly available datasets for autonomous driving, from small-scale to larger collections with thousands of miles of driving, we consider that the process of collecting and processing them is often overlooked in the literature. From a data-driven perspective, quality of a dataset has proven as important as quantity especially when evaluating self-driving technologies where safety is crucial. In this paper, we provide a guideline going through all the steps from configuring the hardware setup to obtaining a clean dataset. We describe the data collection scenario design, the hardware and software employed in the process, the challenges that must be considered, data filtering and validation stage. This work stems from our experience in collecting the UPB campus driving dataset released together with this work. It is our belief that having a clean and efficient process of collecting a small but meaningful dataset has the potential to improve benchmarking autonomous driving solutions, capturing local environment particularities.
Loading