Influence of imperfect annotations on deep learning segmentation models

Christopher Brückner, Chang Liu, Leonhard Rist, Andreas K. Maier

Published: 01 Jan 2024, Last Modified: 11 Nov 2024Bildverarbeitung für die Medizin 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Convolutional neural networks are the most commonly used models for multi-organ segmentation in CT volumes. Most approaches are based on supervised learning, which means that the data used for training requires expert annotations, which is time-consuming and tedious. Errors introduced during that process will inherently influence all downstream tasks and are difficult to counteract. To showthe impact of such annotation errors when training deep segmentation models, we evaluate simple U-Net architectures trained on multi-organ datasets including artificially generated annotation errors. Specifically, three types of common masks are simulated, i.e. constant over- or under-segmentation error at the organ’s boundary and the mixed-segmentation error. Our results show that using the ground truth data leads to mean dice score of 0.780, compared to mean dice scores of 0.761 and 0.663 for the constant over- and under-segmentation errors respectively. In contrast the mixed segmentation introduces a rather small performance decrease with a mean dice score of 0.771.