Network Inversion for Extreme-Case Training-Like Data Reconstruction
TL;DR: In this paper we propose a network inversion based approach for reconstruction of training-like data from a trained classifier in the most extreme settings.
Abstract: Machine learning models are often trained on proprietary or private datasets that cannot be openly shared. However, the trained model weights are frequently distributed under the assumption that sharing model parameters does not compromise the confidentiality or privacy of the training data. In this work, we challenge this assumption by presenting \textbf{Training-Like Data Reconstruction (TLDR)}, as a general-purpose and architecture-agnostic framework for reconstructing training data from a fully trained classifier. Our approach leverages network inversion techniques to recover data that closely resembles the original training samples by exploiting key properties of the classifier with respect to the training data, without requiring access to training dynamics, gradients, pre-trained models, auxiliary datasets, or unobvious priors. Operating in this extreme setting, we demonstrate successful reconstruction of samples with high similarity to the original training data from diverse classifier architectures highlighting critical privacy concerns associated with sharing model parameters. While prior work in this extreme setting has been limited to binary MLP classifiers trained on small datasets, our framework extends to multi-class classification tasks for models based on diverse architectures trained on significantly larger and more complex datasets. Furthermore, we provide quantitative evaluation using the Structural Similarity Index Measure (SSIM) to compare the reconstructed samples with the training samples.
Submission Number: 2355
Loading