Duplicate-Aware Controlled Code Generation: Enhancing Copyright Protection with Targeted Reordering Beam Search in LLMs

ACL ARR 2026 January Submission3593 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Controlled Code Generation, Copyright Protection, Large Language Models, Verbatim Memorization, Beam Search
Abstract: The increasing integration of large language models (LLMs) in code generation has raised critical copyright concerns, particularly regarding the verbatim repetition of copyrighted code. To address this challenge, we propose a novel task: Duplicate-Aware Controlled Code Generation (DACCG), which aims to mitigate verbatim repetition while preserving the quality of generated code. To this end, we introduce Targeted Reordering Beam Search (TRBS), a plug-and-play decoding method that dynamically reorders beam candidates to reduce direct copying. TRBS leverages the FM-index for efficient substring detection and employs a spike-entropy-based protection mechanism to safeguard structural anchors critical to code coherence. Experimental results on a multi-language code generation benchmark demonstrate that TRBS effectively reduces verbatim repetition while maintaining functional adequacy. Our research represents a pioneering effort in code copyright protection from the model user's perspective, offering novel insights into responsible code generation practices.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Ethics, Bias, and Fairness,Generation
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Data analysis
Languages Studied: English,Programming Languages
Submission Number: 3593
Loading