TokenSwap: A Lightweight Method to Disrupt Memorized Sequences in LLMs

Kaustubh Ponkshe; Parjanya Prajakta Prashant; Babak Salimi

TokenSwap: A Lightweight Method to Disrupt Memorized Sequences in LLMs

Kaustubh Ponkshe, Parjanya Prajakta Prashant, Babak Salimi

Published: 24 Sept 2025, Last Modified: 07 Nov 2025NeurIPS 2025 Workshop GenProCCEveryoneRevisionsBibTeXCC BY 4.0

Track: Regular paper

Keywords: verbatim generation, memoirzation, copyright, LLMs

TL;DR: Inference-time approach to mitigate verbatim generation in large language models

Abstract: As language models scale, their performance improves dramatically across a wide range of tasks, but so does their tendency to memorize and regurgitate parts of their training data verbatim. This tradeoff poses serious legal, ethical, and copyright concerns, especially in creative and real-world deployments. Existing mitigation techniques often require retraining or access to internal weights, making them impractical for most users and creators who interact with LLMs only through APIs. In this work, we introduce \textsc{TokenSwap}, a lightweight, post-hoc defense designed for such realistic settings where the user can only access token-level outputs. Our key insight is that while large models are necessary for high task performance, small models (e.g., DistilGPT-2) are often sufficient to assign fluent, grammatically plausible probabilities to common function words - and crucially, they memorize far less. By selectively swapping token probabilities between models, \textsc{TokenSwap} preserves the capabilities of large models while reducing their propensity for verbatim reproduction. Evaluations on Pythia-6.9B and Llama-3-8B show up to a 10$\times$ drop in exact memorization with negligible task degradation. Our method offers a practical, accessible solution for mitigating memorized generation in deployed LLMs, enabling more responsible integration of Generative AI.

Submission Number: 22

Loading