Keywords: Data attribution, segmentation, data curation
TL;DR: We study the task of data attribution for segmentation models.
Abstract: The quality of segmentation models is driven by their training datasets labeled with detailed segmentation masks. How does the composition of such a training dataset contribute to the performance of the resulting segmentation model? In this work, we take a step towards attaining such an understanding by applying the lens of data attribution to it. To this end, We first identify specific behaviors of these models to attribute, and then provide a method for computing such attributions efficiently. We validate the resulting attributions, and leverage them to both identify harmful labeling errors and curate a $50$\% subset of the MS COCO training dataset that leads to a $2.79$\% $\pm$ $0.49$\% increase in mIOU over the full dataset.
Submission Number: 40
Loading