GenAI Copyright Evidence with Operational Meaning

Published: 01 Jul 2025, Last Modified: 01 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: copyright, generative model, operational meaning
TL;DR: We provide a quantitative and theoretical framework for AI copyright evidence with operational meaning.
Abstract: The remarkable success of generative AI models, enabled by large-scale training on massive and diverse datasets, has raised growing concerns about whether their outputs constitute copyright infringement. Under U.S. copyright law, two key elements must be established for infringement: the model is trained on the copyrighted content ($\texttt{Access}$) and its outputs are substantially similar to the copyrighted content ($\texttt{Similarity}$). However, determining infringement is inherently complex, and legal practices often rely on subjective assessments. In this paper, we focus on designing criteria that provide quantitative evidence to help determine AI copyright infringement. We introduce a game-theoretic framework that formalizes $\texttt{Access}$ and $\texttt{Similarity}$ as a membership inference game and a data reconstruction game, respectively, between a plaintiff and a defendant. The plaintiff’s performance in these games serves as a quantifiable criterion with a clear operational meaning, aligned with the real-world legal context. We also prove that the widely adopted Near-Access-Free (NAF) copyright framework fails to provide meaningful guarantees for either game. Our theoretical findings are supported by empirical evaluations on image diffusion models, highlighting the potential of our framework for informing legal thresholds and guiding AI copyright regulation.
Submission Number: 138
Loading