Keywords: inversion, training data reconstruction
TL;DR: we recover suitable training data given only model weights
Abstract: Modern language models often have open weights but closed training data. We formalize the problem of data recovery from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering data given only weights of the original and finetuned models. The training subset pinpointed by our method in a large corpus can be used to train another model to comparable performance. Even when none of the true training data is available, data selected by our method from publicly available Web documents can be used to train a competent model.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1185
Loading