Ambiguous Annotations: When is a Pedestrian not a Pedestrian?

Published: 22 Apr 2024, Last Modified: 03 May 2024VLADR 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: autonomous driving, data quality, ambiguity, annotator disagreement
TL;DR: Removing highly ambiguous data from the training dataset can improve model performance.
Abstract: Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality and correctness. However, it is not always possible to objectively determine, whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1-score, thereby saving training time and annotation costs. Furthermore, we demonstrates that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.
Submission Number: 11
Loading