Keywords: LLMs, model poisoning, encryption, memorization
TL;DR: We explore LLM memorisation capacity on hidden and encrypted data
Abstract: The rapid rise of large language models (LLMs) has transformed multiple domains, from natural language processing to automated content generation. As they grow in size and complexity, these models accumulate capabilities that go beyond their main intended purpose. While extensive research has explored the degrees to which LLMs accidentally memorize their training data, including copyrighted material, little attention has been paid to their ability to memorize and recall out of distribution (OOD) data. In this work, we perform the first such study, introducing memorization of encrypted data (MED), a method designed to embed and retrieve encrypted data within LLMs, while preserving the LLM utility on its original tasks. MED can be used for multiple purposes: as a model watermarking mechanism, as a means to share secrets, or even as a data compression mechanism. We experiment with two encryption algorithms, the shift cipher and AES, that generate data distributions which differ significantly from each other, and from that used for training LLMs. We show that large encrypted text blocks can be memorized by LLMs without harming their regular performance, even when using cryptographically secure protocols such as AES.
Submission Number: 114
Loading