Training-Like Data Reconstruction

Pirzada Suhail; Amit Sethi

Training-Like Data Reconstruction

Pirzada Suhail, Amit Sethi

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Network Inversion, Interpretability, Privacy, Training Data Reconstruction

TL;DR: In this paper, we propose a network inversion-based approach to reconstruct training-like data from trained machine learning models.

Abstract: Machine Learning models are often trained on proprietary and private data that cannot be shared, though the trained models themselves are distributed openly assuming that sharing model weights is privacy preserving, as training data is not expected to be inferred from the model weights. In this paper, we present Training-Like Data Reconstruction (TLDR), a network inversion-based approach to reconstruct training-like data from trained models. To begin with, we introduce a comprehensive network inversion technique that learns the input space corresponding to different classes in the classifier using a single conditioned generator. While inversion may typically return random and arbitrary input images for a given output label, we modify the inversion process to incentivize the generator to reconstruct training-like data by exploiting key properties of the classifier with respect to the training data. Specifically, the classifier is expected to be relatively more confident and robust in classifying training samples, and the gradient of the classifiers output with respect to the classifier’s weights is also expected to be lower for training data than for random inverted samples. Using these insights, along with some prior knowledge about the images, we guide the generator to produce data closely resembling the original training data. To validate our approach, we conduct empirical evaluations on multiple standard vision classification datasets, demonstrating that leveraging these robustness and gradient properties enables the reconstruction of data semantically similar to the original training data, thereby highlighting the potential privacy risks involved in sharing machine learning models.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13775

Loading