DCLLM: Effects of Decontaminating a Contaminated LLM in Knowledge Distillation

DCLLM: Effects of Decontaminating a Contaminated LLM in Knowledge Distillation

ACL ARR 2025 July Submission921 Authors

29 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Knowledge Distillation (KD) allows larger “teacher” models to inform smaller “student” models that can mitigate the heavy computational demands of large language models (LLMs). LLMs are trained on extensive publicly available data, and they are susceptible to being “contaminated” through exposure to the evaluation data. Consequently, a contaminated teacher LLM can artificially inflate the performance of its student model in a KD setting. Although previous research has examined the efficacy of unlearning methods in removing undesirable information from LLMs and explored various KD approaches utilizing LLMs, the challenge of addressing contamination in teacher LLMs and minimizing the effects of such contamination on student models has been notably underexplored. In this work, we propose a novel framework, named DCLLM, that effectively evaluates the performance of a contaminated teacher LLM across different KD settings and decontaminates it utilizing a variety of unlearning algorithms. Our framework demonstrates that these unlearning methods effectively decontaminate the teacher and improve the model performance by around 2-3% in terms of Rouge-L score.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Distillation; Parameter-efficient-training; LLM Efficiency; NLP in resource-constrained settings

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 921

Loading