ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: security and privacy, fine-tuning, ethical considerations in NLP applications
TL;DR: Fine-tuning language models to mitigate regurgitation in open-ended generation.
Abstract: Language models (LMs) can memorize and reproduce segments from their pretraining data verbatim even in non-adversarial settings, raising concerns about copyright, plagiarism, privacy, and creativity. We introduce Paraphrase Preference Optimization (ParaPO), a post-training method that fine-tunes LMs to reduce regurgitation while preserving their overall utility. ParaPO trains LMs to prefer paraphrased versions of memorized segments over the original verbatim content from the pretraining data. To preserve the ability to recall famous quotations, we additionally develop a variant of ParaPO that uses system prompts to control whether LMs should reduce regurgitation. On Llama3.1-8B, ParaPO consistently reduces regurgitation across all datasets we evaluated (e.g., reducing the regurgitation metric from 17.3 to 12.9 in creative writing), whereas unlearning methods used in prior work to mitigate regurgitation are less effective outside their targeted unlearned domain (from 17.3 to 16.9). On the instruction-tuned model Tulu3-8B, ParaPO with system prompts achieve a 27.5\% reduction in regurgitation (from 8.7 to 6.3) in creative writing, while preserving similar accuracy in requesting famous quotations. In contrast, the base Tulu model with inference-time system prompts achieves only a 3.5\% reduction (from 8.7 to 8.4).
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Flagged For Ethics Review: true
Ethics Comments: This submission presents a method to discourage verbatim generation of pre-training data. The method could potentially be used to hide copyright infringement from model developers who may unethically use large-scale copyright-protected data for pre-training.
Submission Number: 932
Loading