Towards Understanding Data Values: Empirical Results on Synthetic DataDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: Understanding the influence of data on machine learning models is an emerging research field. Inspired by recent work in data valuation, we perform several experiments to get an intuition for this influence on a multi-layer perceptron. We generate a synthetic two-dimensional data set to visualize how different valuation methods value data points on a mesh grid spanning the relevant feature space. In this setting, individual data values can be derived directly from the impact of the respective data points on the decision boundary. Our results show that the most important data points are the miss-classified ones. Furthermore, despite performance differences on real world data sets, all investigated methods except one qualitatively agree on the data values derived from our experiments. Finally, we place our results into the recent literature and discuss data values and their relationship to other methods.
Supplementary Material: zip
9 Replies

Loading