Extracting information from fine-tuned weights

Published: 24 Sept 2025, Last Modified: 25 Nov 2025NEGEL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Fine-tuned weights, GL-equivariance
Abstract: Pre-trained and fine-tuned language models have become a pervasive tool. While the behavior of a model is highly dependent on its training data, the data itself is often not available and it is necessary to infer properties of it via training artifacts such as the model's weights, internal representations, or its responses. As such, model weights are an increasingly popular data modality for predicting model-level and data-level covariates. These objects are in the non-Euclidean space of all possible weights modulo their symmetries (identifying different sets of weights that define the same function). Working in this space, we empirically observe that the fine-tuned weights capture information about the fine-tuning sets, consistent with recent investigations in the field. We argue that this may enable the prediction of fine-tuned weights from partially fine-tuned models. Our experimental findings indicate that, in certain cases, predicting fine-tuned weights is feasible. The results presented here are part of an ongoing research effort.
Submission Number: 32
Loading