Data2Model: Predicting Models from Training DataDownload PDF

24 Aug 2022, 08:18 (modified: 18 Nov 2022, 01:16)SaTML 2023Readers: Everyone
Abstract: Understanding how changes of training data affect a trained model is critical to building trust in various stages of a machine learning pipeline: from cleaning poor-quality samples and tracking important ones to be collected during data preparation, to calibrating uncertainty of model prediction, to interpreting why certain behaviors of a model emerge during deployment. In this paper, we present a framework, Data2Model, for predicting the output model of a learning algorithm given the input data points. Specifically, Data2Model learns a parameterized function that takes a dataset $S$ as the input and predicts the model obtained by training on $S$. Despite the potential complexity of the underlying end-to-end training process being approximated, we show that a neural network-based set function class can successfully predict the trained model from its training data. We introduce novel global and local regularization techniques for preventing overfitting and rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. We perform extensive empirical investigations and demonstrate that Data2Model gives rise to a wide range of applications that boost interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration.
0 Replies