Abstract: Large language models (LLMs) exhibit exceptional linguistic capabilities, yet their ability to verbatim reproduce copyrighted data from high-quality training datasets raises concerns about improper exploitation via Artificial Intelligence (AI) data traceability. Such copyrighted data may include general information (e.g., historical knowledge, or common sense), necessitating that LLMs generate content preserving key information rather than exact replicas. Current unlearning methods, however, primarily aim to fully forget targeted knowledge or key information, potentially leading to hallucinations of general information. To address these challenges, we propose an unlearning method based on attention flattening in auto-regressive models, combined with simulation tasks for targeted information forgetting. During optimization, the model is trained only on real tasks while acquiring knowledge from simulation task learning to suppress the memorization of copyrighted data. In inference, we compare the sensitivity of attention heads between simulation and real tasks, identify those most relevant to copyrighted data, and perform attention head pruning to suppress the generation of such data. Experimental results show that our method achieves lower perplexity and BLEU scores while maintaining higher entity coverage scores compared to existing unlearning baselines in generation tasks on copyrighted general information.
External IDs:doi:10.1109/tifs.2025.3638667
Loading