Mitigating Text Degeneration via Token-Level Guidance For pruned Large Language Models

Junyoung Lee; Sehyeon Park; Shinhyoung Jang; Seonha Ryu; Hojeong Kim; Hyunsei Lee; Il hong Suh; Yeseong Kim

Mitigating Text Degeneration via Token-Level Guidance For pruned Large Language Models

Junyoung Lee, Sehyeon Park, Shinhyoung Jang, Seonha Ryu, Hojeong Kim, Hyunsei Lee, Il hong Suh, Yeseong Kim

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language model, Text Degnertaion, Pruning, Distillation

TL;DR: Surpressing text degeneration for pruned large language model

Abstract: Large language models (LLMs) suffer from substantial memory and inference costs, and pruning has emerged as a widely adopted strategy for compression. However, while pruning effectively reduces model size and latency, it often exacerbates undesirable side effects such as text degeneration, particularly repetition, even when perplexity remains largely intact. We observe that standard post-pruning fine-tuning is insufficient to suppress repetition, motivating the need for more targeted approaches. To address this issue, we propose two token-level guidance methods: FOCUS and RePAIR. FOCUS adjusts token probabilities through token-weighted distillation, focusing on high-confidence regions to better align the student with the teacher while reducing the likelihood of undesirable tokens. In contrast, RePAIR employs contrastive training with negative and positive samples to explicitly encourage the generation of alternative tokens. Experiments on open-ended and instruction-based generation tasks demonstrate that our methods substantially reduce repetition and improve generation diversity, while causing only minimal impact on perplexity. Furthermore, our methods are compatible with other training strategies and consistently enhance their performance. Code will be available soon.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 4538

Loading