The Files are in the Computer: Copyright, Memorization, and Generative AI

Published: 01 Jan 2024, Last Modified: 25 May 2024CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A central issue in copyright lawsuits against generative-AI companies is the degree to which a generative-AI model does or does not "memorize" the data it was trained on. Unfortunately, the debate has been clouded by ambiguity over what "memorization" is, leading to legal debates in which participants often talk past one another. In this essay, we attempt to bring clarity to the conversation over memorization.
Loading