Reconstructing Training Data From Real-World Models Trained with Transfer Learning

Yakir Oz; Gilad Yehudai; Gal Vardi; Itai Antebi; michal Irani; Niv Haim

Reconstructing Training Data From Real-World Models Trained with Transfer Learning

Yakir Oz, Gilad Yehudai, Gal Vardi, Itai Antebi, michal Irani, Niv Haim

18 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: data reconstruction, memorization, privacy

TL;DR: We show reconstruction of training samples from models trained on embeddings of large-scale pretrained models via transfer learning

Abstract:

Current methods for reconstructing the training data from trained classifiers are restricted to very small models, limited training set sizes, and low-resolution images. Such restrictions hinder their applicability to real-world scenarios. In this paper, we present a novel approach enabling data reconstruction in realistic settings for models trained on high-resolution images. Our method adapts the reconstruction scheme of Haim et al. [2022] to real-world scenarios -- specifically, targeting models trained via transfer learning over image embeddings of large pre-trained models like DINO-ViT and CLIP. Our work employs data reconstruction in the embedding space rather than in the image space, showcasing its applicability beyond visual data. Moreover, we introduce a novel clustering-based method to identify good reconstructions from thousands of candidates. This significantly improves on previous works that relied on knowledge of the training set to identify good reconstructed images. Our findings shed light on a potential privacy risk for data leakage from models trained using transfer learning methods.

Supplementary Material: pdf

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1560

Loading