Abstract: The success of modern deep learning hinges on vast training data, much of which is scraped from the web and may include copyrighted or private content—raising serious legal and ethical concerns when used without authorization. Dataset provenance seeks to identify whether a model has been trained on specific data collections, thus protecting copyright holders while preserving data utility. Existing techniques either watermark datasets to embed distinctive behaviors, or directly infer usage from discrepancies in model outputs between seen and unseen samples. These approaches exploit the fundamental problem of empirical risk minimization to overfit to seen features. Hence, provenance signals are considered inherently hard to erase, while the adversary’s perspective remains largely overlooked, limiting our ability to assess reliability in real-world scenarios. In this work, we present a unified framework that interprets both watermarking and inference-based provenance as manifestations of output divergence, modeling the interaction between auditor and adversary as a min-max game over such divergences. This perspective motivates DivMin, a simple yet effective learning strategy that minimizes the relevant divergence to suppress provenance cues. Experiments across diverse image datasets demonstrate that, starting from a pretrained vision-language model, DivMin retains over 93% of the full fine-tuning performance gain relative to a zero-shot baseline, while evading all six state-of-the-art auditing methods. Our findings establish divergence minimization as a direct and practical path to obfuscating provenance, offering a realistic simulation of potential adversary strategies to guide the development of more robust auditing techniques. Code and Appendix will be available at https://github.com/GradOpt/DivMin.
Loading