WARP: Weight-Space Analysis for Recovering Training Data Portfolios

Published: 24 May 2026, Last Modified: 28 May 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Weight-Space Geometry, Recovering Data Portfolios, Model Merging
TL;DR: We propose a framework that can recover training data portfolios by using domain gradient alignments in the weight-space geometry.
Abstract: Foundation models are routinely released to the public, yet the data recipes used to train them---such as the domain mixtures that determine how different sources are sampled---are rarely disclosed. This creates an access asymmetry: researchers can study the resulting models but lack visibility into the training distributions that produced them. Prior works for inferring training data, such as membership inference, detect at the level of individual samples and therefore cannot characterize the global composition of a model's training corpus. We introduce WARP, a framework that recovers a fine-tuned model's domain mixture directly from its released weights. WARP interpolates between the base and fine-tuned models using model merging, generating \emph{pseudo-checkpoints} that approximate the missing training trajectory and expose a geometric footprint of the training data in weight space. From these footprints, WARP extracts geometric features and maps them to domain proportions using either a parameter-free softmax readout or a MLP projector trained on synthetic mixtures. In controlled experiments with BERT and GPT-2, WARP recovers domain mixtures with MAE as low as $0.048$ and $0.117$ respectively, outperforming membership inference and a variant with access to the true training trajectory, and remains accurate when recovering different training recipes.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 10
Loading